Welcome to the High-Performance Computing Technologies Course

This is a two week course introducing the basics of building, configuring and operating High-Performance Computing Clusters. The goal of this course is to give students a solid foundation for understanding the components of a cluster, how they are typically configured, and what challenges operating such a machine entails.

Over these two weeks students will learn first hand how set up their own clusters from scratch. We will explain the necessary network administration basics to provision cluster nodes over the network and will learn how to manage multiple systems without physical access.

Automation and a uniform configuration play a central role in managing such large systems. We will use modern techniques of configuration management and show how to use Ansible to simplify repetitive tasks.

One central component of a cluster is its jobs batch system and scheduler. Students will learn how to set up the necessary software environment to enable this common workflow, learn how to install and manage software on multiple compute nodes using software modules, and configure essential software components for massively parallel applications.

The course will conclude by showcasing and discussing real world deployments.






Indices and tables