Using Slurm

To run an application, users write a job script and submit it using the sbatch command. Such submissions are then uniquely identified by their job ID.

Each job can request a total walltime, as well as a number of processors. Using this information, the scheduler decides when to allocate resources for your job and run them using the batch system.

Job scripts

A job script is simply a shell script with a special comment header. These header additions allow you to specify parameters for your job, such as the resources you need.

The following example illustrates a job script which requests a single processor on a single node and executes a serial program on it.

#!/bin/sh
#SBATCH --time = 01:00:00
#SBATCH --job-name = mytestjob
#SBATCH --ntasks=1 --nodes=1
#SBATCH --partition=normal
#SBATCH --output=mytestjob-%j.out

# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR

# run my program
./myexecutable

In the example above, the lines beginning with #SBATCH set job scheduler options

#SBATCH –time = 01:00:00	Sets the maximum wallclock time the jobs is allowed to run. In this case 1 hour.
#SBATCH –job-name = mytestjob	Sets the job name as seen in the output of squeue command
#SBATCH –ntasks=1 –nodes=1.	Specifies the requested number of nodes and, number of tasks.
#SBATCH –partition=normal	Specifies the queue in which the job will run.
#SBATCH –output=mytestjob-%j.out	Specifies the output file of the job’s log. Here %j specifies the job id.
$SLURM_SUBMIT_DIR	Current working directory where sbatch command was issued.

Controlling email notifications

Two options can be added to your job scripts to control when and where the batch system sends email notifications about jobs.

#SBATCH --mail-type=BEGIN

Tells the batch system to send email if the job begins to run. Other options include: NONE, BEGIN, END, FAIL, REQUEUE, and ALL.

#SBATCH --mail-user=user@domain.com

Where to send emails to

Job control and monitoring

sbatch

Submit a job to the batch system

sbatch job_script

scancel

The scancel command will remove the job specified by JOBID from the queue or terminate a job that is executing.

scancel JOBID

squeue

The squeue command displays information of the queue of jobs:

squeue
JOBID PARTITION    NAME        USER      ST      TIME  NODES NODELIST(REASON)
  normal      mytestjob-1 tuXXXXXX  R       0:05      8 c[003-010]
  normal      mytestjob-2 tuXXXXXX  R       0:02      8 c[011-018]
  normal      mytestjob-3 tuXXXXXX  R       0:02      8 c[020-027]
  normal      mytestjob-4 tuXXXXXX  R       0:02      8 c[028-035]
  normal      mytestjob-5 tuXXXXXX  R       0:02      8 c[036-043]
  normal      mytestjob-6 tuXXXXXX  R       0:02      8 c[044-049,059-060]

All jobs marked with R are running, PD means the job is queued or on hold.

Checking jobs

If a job behaves strangely or to simply look at more details of how the job is being viewed by the scheduler you can have a closer look at each job using the scontrol show job "jobid" command.

$ scontrol show job 209
  JobId=209 JobName=mytestjob-7
  UserId=tuXXXXXX(XXXX) GroupId=XXX(XXXXX) MCS_label=N/A
  Priority=4294901554 Nice=0 Account=(null) QOS=(null)
  JobState=PENDING Reason=Resources Dependency=(null)
  Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
  RunTime=00:00:00 TimeLimit=04:00:00 TimeMin=N/A
  SubmitTime=2023-08-01T13:36:07 EligibleTime=2023-08-01T13:36:07
  AccrueTime=2023-08-01T13:36:07
  StartTime=2023-08-01T17:34:02 EndTime=2023-08-01T21:34:02 Deadline=N/A
  SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-01T13:37:54 Scheduler=Backfill:*
  Partition=normal AllocNode:Sid=cb2rr:3256738
  ReqNodeList=(null) ExcNodeList=(null)
  NodeList= SchedNodeList=c[003-010]
  NumNodes=8-8 NumCPUs=160 NumTasks=160 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
  TRES=cpu=160,mem=1000000M,node=8,billing=160
  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
  MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
  Features=(null) DelayBoot=00:00:00
  OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
  Command=/home/tuk02575/test_slurm/test.sh
  WorkDir=/home/tuk02575/test_slurm
  StdErr=/home/tuk02575/test_slurm/slurm-209.out
  StdIn=/dev/null
  StdOut=/home/tuk02575/test_slurm/slurm-209.out
  Power=

Interactive sessions

A user can submit a request to the job scheduler for an interactive shell session on a compute node. For example, an interactive session request for a single processor node can be requested as follows:

srun -N 1 --partition normal --pty bash -i

The srun command will not return until a node with the specified resources becomes available. Once the resources are available, a shell prompt on the allocated node is presented to the user.

[tuXXXXXX@cb2rr test_slurm]$ srun -N 1 --partition normal --pty bash -i
srun: job 215 queued and waiting for resources

===================================================
Begin TASK Prologue Tue Aug  1 01:44:38 PM EDT 2023
===================================================
Job ID:           215
Username:         tuXXXXXX
Group:            xxx
Job Name:         bash
Resources List:   nodes=1:ppn=1:ntasks=1
Queue:            normal
Nodes:      c001
===================================================
End TASK Prologue Tue Aug  1 01:44:38 PM EDT 2023
===================================================
[tuXXXXXX@c001 ~]$ echo Hello World!
Hello World!
[tuXXXXXX@c001 ~]$ exit
exit

Job Script Examples

MPI jobs using srun

#!/bin/bash
#======================================================
#
# Job script for running a parallel job on multiple
# cores across multiple nodes
#
#======================================================

#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the normal partition (queue)
#SBATCH --partition=normal
#
# No. of nodes (see queue limits above)
#SBATCH --nodes=2
#
# No. of tasks (CPUs) required (see queue limits above)
#SBATCH --ntasks=40
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=mpi_test
#
# Output file
#SBATCH --output=mpi_test-%j.out
#======================================================

# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR

# Load Modules
module load mpi/openmpi

# Modify the line below to run your program
srun -n $SLURM_NTASKS  ./my_mpi_application.x

MPI jobs using mpirun

#!/bin/bash
#======================================================
#
# Job script for running a parallel job on multiple
# cores across multiple nodes
#
#======================================================

#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the normal partition (queue)
#SBATCH --partition=normal
#
# No. of nodes (see queue limits above)
#SBATCH --nodes=2
#
# No. of tasks (CPUs) required (see queue limits above)
#SBATCH --ntasks=40
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=mpi_test
#
# Output file
#SBATCH --output=mpi_test-%j.out
#======================================================

# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR

# Load Modules
module load mpi/openmpi

# Modify the line below to run your program
mpirun -np $SLURM_NTASKS  ./my_mpi_application.x

GPU jobs

#!/bin/bash
#======================================================
#
# Job script for running a parallel job on a single gpu
#
# Submit as follows:
#
#======================================================

#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the gpu partition (queue)
#SBATCH --partition=gpu
#
# Total number GPUs for the job
#SBATCH --gpus=2
#
# Number of GPUs to use per node (max 2)
#SBATCH --gpus-per-node=2
#
# Number of CPUs per GPU
#SBATCH --cpus-per-gpu=1
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=gpu_test
#
# Output file
#SBATCH --output=gpu_test-%j.out
#======================================================

# Load CUDA always
module load cuda

# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR

srun --gpus 1 ./my_gpu_application.x