Overview and Architecture

Slurm has a centralized manager, slurmctld, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure (not used in this exercise. Each compute node has a slurmd daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work.

The slurmd daemons provide fault-tolerant hierarchical communications. There is an optional slurmdbd (Slurm DataBase Daemon) which can be used to record accounting information for multiple Slurm-managed clusters in a single database. There is an optional . User tools include srun to initiate jobs, scancel to terminate queued or running jobs, sinfo to report system status, squeue to report the status of jobs, and sacct to get information about jobs and job steps that are running or have completed.

The sview commands graphically reports system and job status including network topology. There is an administrative tool scontrol available to monitor and/or modify configuration and state information on the cluster. The administrative tool used to manage the database is sacctmgr. It can be used to identify the clusters, valid users, valid bank accounts, etc.

Slurm Components

Slurm Components. (taken from https://slurm.schedmd.com/overview.html)

In this exercise we will install and configure slurmctld and slurmd componenets and optionally configure slurmdbd to manage user accounts.

Installing and configuring prerequisites

Create the necessary user accounts

Slurm and Munge require consistent UID and GID across all servers and nodes in the cluster, including the slurm and munge users.

---
- name: Ensure group munge exists
  ansible.builtin.group:
    name: munge
    gid: 961
    state: present

- name: Create munge user
  ansible.builtin.user:
    name: munge
    comment: MUNGE Uid 'N' Gid Emporium
    create_home: true
    home: /var/lib/munge
    uid: 961
    group: munge
    shell: /sbin/nologin
    state: present

- name: Ensure group slurm exists
  ansible.builtin.group:
    name: slurm
    gid: 962
    state: present

- name: Create slurm user
  ansible.builtin.user:
    name: slurm
    comment: SLURM workload manager
    create_home: true
    home: /var/lib/slurm
    uid: 962
    group: slurm
    shell: /sbin/nologin
    state: present

Be aware to create the users in both master and compute nodes.

Munge authentication

- name: Install munge
  ansible.builtin.yum:
    name:
      - munge
      - munge-libs
      - rng-tools
    state: latest
    update_cache: true

- name: Copy munge key
  ansible.builtin.copy:
    src: munge.key
    dest: /etc/munge/munge.key
    owner: munge
    group: munge
    mode: "0400"

- name: Start munge service
  ansible.builtin.service:
    name: munge
    state: started
    enabled: true

The Munge key can be generated (IN MASTER ONLY) by running:

dd if=/dev/urandom bs=1 count=1024 > munge.key

The key should be copied to all compute nodes as well.

Installation of Common Packages

Install all the necessary packages in both master and compute nodes. If packages are not available in compute nodes, follow the steps in Exercise 5: RPM repositories with Cobbler to create a new repository called slurm and add it to the cobbler profile accordingly.

- name: Install utilities
  ansible.builtin.yum:
    name:
      - openssl
      - numactl
      - hwloc
      - lua
      - libibmad
      - libibumad
    state: latest
    update_cache: true

- name: Install slurm common packages
  ansible.builtin.yum:
    name:
      - slurm
      - slurm-perlapi
      - slurm-pam_slurm

System-wide Configuration

The slurm.conf file can be generated in the following link https://slurm.schedmd.com/configurator.html. Read carefully each of the options, in particular, pick linear for the resource selection.

- name: Ensure the folder /etc/slurm folder exists
  ansible.builtin.file:
    path: /etc/slurm
    state: directory
    owner: slurm
    group: slurm
    mode: "0755"

- name: Copy slurm config files
  ansible.builtin.copy:
    src: "{{ item }}"
    dest: /etc/slurm/{{ item }}
    owner: root
    group: root
    mode: u=rw,g=r,o=r
    force: true
  with_items:
    - slurm.conf

The most relevant options are, for Temple’s hardware:

ClusterName="MASTERX"
SlurmctldHost=master
AuthType=auth/munge
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=0
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log

NodeName=c0[1-3] RealMemory=11000  CPUs=12 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
PartitionName=normal Nodes=ALL Default=YES MaxTime=INFINITE State=UP

Master configuration

Install the necessary packages for slurmctld and start the corresponding systemd service:

---
- name: Install slurm server
  ansible.builtin.yum:
    name:
      - slurm-slurmctld
      - s-nail
    state: latest
    update_cache: true

- name: Create /var/spool/slurmctld folder
  ansible.builtin.file:
    path: /var/spool/slurmctld
    state: directory
    owner: slurm
    group: slurm
    mode: "0755"

- name: Check /var/log/slurmctld.log exists
  ansible.builtin.file:
    path: /var/log/slurmctld.log
    state: touch
    owner: slurm
    group: slurm
    access_time: preserve

- name: Start slurmctld service
  ansible.builtin.service:
    name: slurmctld
    state: restarted
    enabled: true

At this point you should be able to run sinfo to check the configuration of the “cluster”.

Client configuration

Complete the configuration on the compute nodes by running:

---
- name: Install slurm client
  ansible.builtin.yum:
    name:
      - slurm-slurmd
    state: latest
    update_cache: true

- name: Create /var/spool/slurmd folder
  ansible.builtin.file:
    path: /var/spool/slurmd
    state: directory
    owner: slurm
    group: slurm
    mode: "0755"

- name: Check /var/log/slurmd.log exists
  ansible.builtin.file:
    path: /var/log/slurmd.log
    state: touch
    owner: slurm
    group: slurm
    access_time: preserve

- name: Force systemd to reread config
  ansible.builtin.systemd: daemon_reload=yes

- name: Start slurmd service
  ansible.builtin.service:
    name: slurmd
    state: restarted
    enabled: true

Check everything is working and the nodes are listed in the sinfo -l command.

Tasks

  1. Create and configure a prologue and epilogue for tasks follwing the documentation at https://slurm.schedmd.com/prolog_epilog.html. For example to let users know details of their jobs:

#!/bin/bash
#
# TASK prologue script. to be run by slurmstepd

trap "exit 0" 1 2 3 15 20

if [[ -n $SLURM_JOB_ID ]]; then
    SLURM_JOB_STDOUT=`/usr/bin/scontrol show job ${SLURM_JOB_ID} | grep -i stdout | cut -f2 -d '='`
    if [[ -s $SLURM_JOB_STDOUT ]]; then
        exit 0
    fi
fi

if [[ $SLURM_PROCID -eq 0 ]]
then
    echo "print ==================================================="
    echo "print Begin TASK Prologue $(date)"
    echo "print ==================================================="
    echo "print Job ID:           $SLURM_JOB_ID"
    echo "print Username:         $SLURM_JOB_USER"

    if [[ -n $SLURM_JOB_GID ]]; then
        GROUP=`grep $SLURM_JOB_GID /etc/group | cut -d ":" -f 1`
        echo "print Group:            $GROUP"
    fi

    echo "print Job Name:         $SLURM_JOB_NAME"
    echo "print Resources List:   nodes=$SLURM_JOB_NUM_NODES:ppn=$SLURM_JOB_CPUS_PER_NODE:ntasks=$SLURM_NTASKS"
    echo "print Queue:            $SLURM_JOB_PARTITION"

    [ -z $SLURM_JOB_ACCOUNT ] || echo "print Account:          $SLURM_JOB_ACCOUNT"
    [ -z $SLURM_JOB_NODELIST ] || echo "print Nodes:      $SLURM_JOB_NODELIST"
    [ -z $SLURM_GPUS ] || echo "print GPUs:             $SLURM_GPUS"

    echo "print ==================================================="
    echo "print End TASK Prologue $(date)"
    echo "print ==================================================="
fi
exit 0
#EOF

These scripts are activated by setting the following configuration options:

TaskEpilog=/var/spool/slurm/task_epilogue
TaskPlugin=task/affinity,task/cgroup
TaskProlog=/var/spool/slurm/task_prologue
  1. (Merit) block users from login into nodes unless they have active jobs running in a given machine. Use the instructions at https://slurm.schedmd.com/pam_slurm_adopt.html. TIP: The relevant PAM configuration is:

- name: Ensure PAM module is used
  ansible.builtin.blockinfile:
    path: /etc/pam.d/sshd
    insertafter: account    required     pam_nologin.so
    content: |
      account    sufficient   pam_slurm_adopt.so
      account    required     pam_access.so

END