HPC Survival Guide: Essential SLURM Commands

High-performance computing (HPC) is where your bioinformatics dreams either fly or crash violently. It’s the place where 96-core nodes work like jet engines, terabytes of RAM sit waiting.

This guide isn’t about theory. It’s about practical survival. You’ll learn the SLURM commands that keep your jobs running, keep you out of trouble, and keep your admin from sending you that dreaded email: “Do not run intensive jobs on the head node.”

1. `sinfo` — Know Where You Are Before You Start a War

This command shows the available partitions (queues), nodes, and their states.

Typical output includes:

Partition names
Node status (idle, alloc, down)
Available CPUs, memory

Why it matters:
You can slot your job correctly and avoid submitting to the wrong partition or a queue that’s fully packed.

2. `sbatch` — The Command That Actually Runs Your Job

This is how you submit a job script to the cluster.

A typical script includes:

Why it matters:
Without sbatch, you’re not using the cluster—you’re using the login node like a reckless amateur.

3. `squeue` — Check Your Job Without Losing Your Mind

This command shows your running and pending jobs.

You’ll see:

Job ID
State (R, PD, CG)
Time used
Queue

Pro tip: If your job is stuck in PD forever, check Reason in the table. Usually it’s memory, time, or partition limits.

4. `scancel` — Learn This Before You Submit Anything Big

Kill a job before it kills your quota.

Or kill everything you ever submitted:

Why it matters:
You will submit something wrong eventually—like asking for 1 TB of RAM by accident. scancel is your parachute.

5. `sacct` — Your Job History, Exposed

Use this when you need to know why your job died.

Key fields:

State — reason for failure
MaxRSS — memory usage
ExitCode — actual code of death

Why it matters:
You can diagnose failures without bothering your admin every five minutes.

6. `srun` — For Interactive Jobs (Use Carefully!)

Use srun when you need an interactive session on a compute node, not on the login node.

Add resources:

Why it matters:
You can test commands without illegally running heavy processes on login.

7. `module load` — The Command That Solves 90% of “Command Not Found” Errors

All HPC systems use environment modules. To load a tool:

Check available modules:

Why it matters:
You don’t install software on HPC; you load it.

8. `scontrol show job` — Deep Dive Into Job Details

When something is stuck, this command tells you everything.

You’ll see:

Node it’s running on
Allocated CPUs
Memory
Environment

Why it matters:
Perfect for diagnosing weird scheduling issues.

9. `sstat` — Real-Time Monitoring

Monitor resource usage while the job runs.

Or for detailed fields:

Why it matters:
Prevents jobs from overusing memory and being killed silently.

10. `sreport` — The Admin’s Favorite Command

This one shows usage statistics.

Why it matters:
Useful to see how much of your fair-share quota you’ve burned.

✅ Bonus Survival Tips Your Admin Wishes You Knew

Never run heavy programs on the login node.

It’s like starting a fire in the kitchen of a crowded restaurant. People get angry.

Check partition limits before submitting.

Use:

or read the documentation.

Log your errors.

Use --output and --error flags in sbatch.

Use job arrays instead of submitting 50 separate jobs.

Respect quotas.

Disk usage is finite. Don’t hoard 3 TB of old FASTQ files.

Final Thoughts

SLURM isn’t your enemy—it’s the gatekeeper to all the computational firepower you need. Learn these commands and you’ll navigate HPC clusters with confidence, avoid common mistakes, and stay out of trouble with the admin who manages your computing fate.

Master these essentials, and HPC becomes less of a mystery and more of a reliable workhorse behind your research.

References

Yoo AB, Jette MA, Grondona M. SLURM: Simple Linux Utility for Resource Management. Job Scheduling Strategies for Parallel Processing. 2003;44–60.
Jette MA, Yoo AB, Grondona M. SLURM architecture and design. Proceedings of the 2002 ClusterWorld Conference and Expo. 2002.
Gentry J, Goodman A. Practical HPC for Biologists. Curr Protoc Bioinformatics. 2019;65:e70.
Harrell J. Getting started with HPC: A guide for life scientists. PLoS Comput Biol. 2019;15(8):e1007062.
Arthur D, et al. HPC workflow best practices for scientific computing. Gigascience. 2020;9(12):giaa123.

Blog