Overview and Architecture
Slurm has a centralized manager, slurmctld
, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure (not used in this exercise. Each compute node has a slurmd
daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work.
The slurmd
daemons provide fault-tolerant hierarchical communications. There is an optional slurmdbd
(Slurm DataBase Daemon) which can be used to record accounting information for multiple Slurm-managed clusters in a single database. There is an optional . User tools include srun
to initiate jobs, scancel
to terminate queued or running jobs, sinfo
to report system status, squeue
to report the status of jobs, and sacct
to get information about jobs and job steps that are running or have completed.
The sview
commands graphically reports system and job status including network topology. There is an administrative tool scontrol
available to monitor and/or modify configuration and state information on the cluster. The administrative tool used to manage the database is sacctmgr
. It can be used to identify the clusters, valid users, valid bank accounts, etc.

Slurm Components. (taken from https://slurm.schedmd.com/overview.html)
In this exercise we will install and configure slurmctld
and slurmd
componenets and optionally configure slurmdbd
to manage user accounts.