Using Slurm
To run an application, users write a job script and submit it using the sbatch command. Such submissions are then uniquely identified by their job ID.
Each job can request a total walltime, as well as a number of processors. Using this information, the scheduler decides when to allocate resources for your job and run them using the batch system.
Job scripts
A job script is simply a shell script with a special comment header. These header additions allow you to specify parameters for your job, such as the resources you need.
The following example illustrates a job script which requests a single processor on a single node and executes a serial program on it.
#!/bin/sh
#SBATCH --time = 01:00:00
#SBATCH --job-name = mytestjob
#SBATCH --ntasks=1 --nodes=1
#SBATCH --partition=normal
#SBATCH --output=mytestjob-%j.out
# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR
# run my program
./myexecutable
In the example above, the lines beginning with #SBATCH
set job scheduler options
#SBATCH –time = 01:00:00 |
Sets the maximum wallclock time the jobs is allowed to run. In this case 1 hour. |
#SBATCH –job-name = mytestjob |
Sets the job name as seen in the output of squeue command |
#SBATCH –ntasks=1 –nodes=1. |
Specifies the requested number of nodes and, number of tasks. |
#SBATCH –partition=normal |
Specifies the queue in which the job will run. |
#SBATCH –output=mytestjob-%j.out |
Specifies the output file of the job’s log. Here %j specifies the job id. |
$SLURM_SUBMIT_DIR |
Current working directory where sbatch command was issued. |
Controlling email notifications
Two options can be added to your job scripts to control when and where the batch system sends email notifications about jobs.
#SBATCH --mail-type=BEGIN
Tells the batch system to send email if the job begins to run. Other options include: NONE, BEGIN, END, FAIL, REQUEUE, and ALL.
#SBATCH --mail-user=user@domain.com
Where to send emails to
Job control and monitoring
sbatch
Submit a job to the batch system
sbatch job_script
scancel
The scancel
command will remove the job specified by JOBID from the queue or terminate a job that is executing.
scancel JOBID
squeue
The squeue
command displays information of the queue of jobs:
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
203 normal mytestjob-1 tuXXXXXX R 0:05 8 c[003-010]
204 normal mytestjob-2 tuXXXXXX R 0:02 8 c[011-018]
205 normal mytestjob-3 tuXXXXXX R 0:02 8 c[020-027]
206 normal mytestjob-4 tuXXXXXX R 0:02 8 c[028-035]
207 normal mytestjob-5 tuXXXXXX R 0:02 8 c[036-043]
208 normal mytestjob-6 tuXXXXXX R 0:02 8 c[044-049,059-060]
All jobs marked with R
are running, PD
means the job is queued or on hold.
Checking jobs
If a job behaves strangely or to simply look at more details of how the job is being viewed by the scheduler you can have a closer look at each job using the scontrol show job "jobid"
command.
$ scontrol show job 209
JobId=209 JobName=mytestjob-7
UserId=tuXXXXXX(XXXX) GroupId=XXX(XXXXX) MCS_label=N/A
Priority=4294901554 Nice=0 Account=(null) QOS=(null)
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=04:00:00 TimeMin=N/A
SubmitTime=2023-08-01T13:36:07 EligibleTime=2023-08-01T13:36:07
AccrueTime=2023-08-01T13:36:07
StartTime=2023-08-01T17:34:02 EndTime=2023-08-01T21:34:02 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-08-01T13:37:54 Scheduler=Backfill:*
Partition=normal AllocNode:Sid=cb2rr:3256738
ReqNodeList=(null) ExcNodeList=(null)
NodeList= SchedNodeList=c[003-010]
NumNodes=8-8 NumCPUs=160 NumTasks=160 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=160,mem=1000000M,node=8,billing=160
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/tuk02575/test_slurm/test.sh
WorkDir=/home/tuk02575/test_slurm
StdErr=/home/tuk02575/test_slurm/slurm-209.out
StdIn=/dev/null
StdOut=/home/tuk02575/test_slurm/slurm-209.out
Power=
Interactive sessions
A user can submit a request to the job scheduler for an interactive shell session on a compute node. For example, an interactive session request for a single processor node can be requested as follows:
srun -N 1 --partition normal --pty bash -i
The srun
command will not return until a node with the specified resources becomes available. Once the resources are available, a shell prompt on the allocated node is presented to the user.
[tuXXXXXX@cb2rr test_slurm]$ srun -N 1 --partition normal --pty bash -i
srun: job 215 queued and waiting for resources
===================================================
Begin TASK Prologue Tue Aug 1 01:44:38 PM EDT 2023
===================================================
Job ID: 215
Username: tuXXXXXX
Group: xxx
Job Name: bash
Resources List: nodes=1:ppn=1:ntasks=1
Queue: normal
Nodes: c001
===================================================
End TASK Prologue Tue Aug 1 01:44:38 PM EDT 2023
===================================================
[tuXXXXXX@c001 ~]$ echo Hello World!
Hello World!
[tuXXXXXX@c001 ~]$ exit
exit
Job Script Examples
MPI jobs using
srun
#!/bin/bash
#======================================================
#
# Job script for running a parallel job on multiple
# cores across multiple nodes
#
#======================================================
#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the normal partition (queue)
#SBATCH --partition=normal
#
# No. of nodes (see queue limits above)
#SBATCH --nodes=2
#
# No. of tasks (CPUs) required (see queue limits above)
#SBATCH --ntasks=40
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=mpi_test
#
# Output file
#SBATCH --output=mpi_test-%j.out
#======================================================
# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR
# Load Modules
module load mpi/openmpi
# Modify the line below to run your program
srun -n $SLURM_NTASKS ./my_mpi_application.x
MPI jobs using mpirun
#!/bin/bash
#======================================================
#
# Job script for running a parallel job on multiple
# cores across multiple nodes
#
#======================================================
#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the normal partition (queue)
#SBATCH --partition=normal
#
# No. of nodes (see queue limits above)
#SBATCH --nodes=2
#
# No. of tasks (CPUs) required (see queue limits above)
#SBATCH --ntasks=40
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=mpi_test
#
# Output file
#SBATCH --output=mpi_test-%j.out
#======================================================
# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR
# Load Modules
module load mpi/openmpi
# Modify the line below to run your program
mpirun -np $SLURM_NTASKS ./my_mpi_application.x
GPU jobs
#!/bin/bash
#======================================================
#
# Job script for running a parallel job on a single gpu
#
# Submit as follows:
#
#======================================================
#======================================================
# Propagate environment variables to the compute node
#SBATCH --export=ALL
#
# Run in the gpu partition (queue)
#SBATCH --partition=gpu
#
# Total number GPUs for the job
#SBATCH --gpus=2
#
# Number of GPUs to use per node (max 2)
#SBATCH --gpus-per-node=2
#
# Number of CPUs per GPU
#SBATCH --cpus-per-gpu=1
#
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=01:00:00
#
# Job name
#SBATCH --job-name=gpu_test
#
# Output file
#SBATCH --output=gpu_test-%j.out
#======================================================
# Load CUDA always
module load cuda
# change to directory where 'sbatch' was called
cd $SLURM_SUBMIT_DIR
srun --gpus 1 ./my_gpu_application.x