1. Installation of InfiniBand support
To enable IB support in CentOS 7, all you have to do is install the
@infiniband
group of packages. Vendors such as Mellanox provide their own
updated IB drivers, which provide additional optimizations and bugfixes for
their products (MLNX_OFED).
Note
We usually don’t install the Mellanox drivers, since the stock drivers are good enough for most cases. But in some cases, when there is a hardware issue, you’re forced to do so, since the stock drivers are updated less frequently.
Install InfiniBand support on your master and compute nodes. Note, that since there are new services and kernel drivers involved, you will have to reboot your systems to fully enable IB support.
On the master install the
@infiniband
package group:[root@master]# yum install @infiniband
Install the same on all compute nodes either manually or using Ansible:
[root@master]# ansible -m shell -a "yum install -y @infiniband" compute
Enable the drivers in both compute nodes and master.
Warning
Make sure that you have secured your data from previous exercises. Rebooting your system might expose flaws in your configuration if they weren’t made persistent.
Warning
Verify that netbooting is
False
in cobbler to avoid any reinstallations.[root@master]# cobbler system report | grep Netboot Netboot Enabled : False Netboot Enabled : False Netboot Enabled : False Netboot Enabled : False
[root@c01]# shutdown -r now [root@c02]# shutdown -r now [root@c03]# shutdown -r now [root@master]# shutdown -r now
Once rebooted, you will have access to several IB commands, including ibstat
which will display the current status of your IB adapters:
[root@master ~]# ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 1 Firmware version: 2.9.1000 Hardware version: b0 Node GUID: 0x0002c903000b86f8 System image GUID: 0x0002c903000b86fb Port 1: State: Initializing Physical state: LinkUp Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02590868 Port GUID: 0x0002c903000b86f9 Link layer: InfiniBand
Here our IB adapter on master of type mlx4_0
is initializing its connection.