8. IP over InfiniBand (IPoIB)

One last topic that we want to mention is that IB can also be used for running an IP stack on top of it. Think of IB as another datalink layer, where we just replace Ethernet with IB. After enabling IB support on your system, you will notice an additional network device called ib0. Each IB port will show up as such a device. Similar to other network devices, you can assign an IP address to it.

[root@master ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:21:9b:9f:7c:df brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:21:9b:9f:7c:e1 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.80/24 brd 172.16.1.255 scope global noprefixroute dynamic em2
       valid_lft 18660sec preferred_lft 18660sec
    inet6 fe80::221:9bff:fe9f:7ce1/64 scope link
       valid_lft forever preferred_lft forever
4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:21:9b:9f:7c:e3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/20 brd 192.168.15.255 scope global noprefixroute em3
       valid_lft forever preferred_lft forever
    inet6 fe80::221:9bff:fe9f:7ce3/64 scope link
       valid_lft forever preferred_lft forever
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:21:9b:9f:7c:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.1/20 brd 192.168.31.255 scope global noprefixroute em4
       valid_lft forever preferred_lft forever
    inet6 fe80::221:9bff:fe9f:7ce5/64 scope link
       valid_lft forever preferred_lft forever
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0b:86:f9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.32.1/20 brd 192.168.47.255 scope global noprefixroute ib0
       valid_lft forever preferred_lft forever

If you want to try this out, apply a static IP on the ib0 interface. E.g. 192.168.32.1/20 on master and 192.168.33.1-4 on your compute nodes.

Note

Since all clusters share the same data-link, you will have to select a different subnet for each cluster to avoid duplicate IPs.

Group

IPoIB Network

1

192.168.32.1/20

2

192.168.48.1/20

3

192.168.64.1/20

4

192.168.80.1/20

5

192.168.96.1/20

Once that is done, you can even ping your systems:

[root@master ~]# ping 192.168.33.1
PING 192.168.33.1 (192.168.33.1) 56(84) bytes of data.
64 bytes from 192.168.33.1: icmp_seq=1 ttl=64 time=3.42 ms
64 bytes from 192.168.33.1: icmp_seq=2 ttl=64 time=0.174 ms
64 bytes from 192.168.33.1: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.33.1: icmp_seq=4 ttl=64 time=0.172 ms
64 bytes from 192.168.33.1: icmp_seq=5 ttl=64 time=0.166 ms
64 bytes from 192.168.33.1: icmp_seq=6 ttl=64 time=0.167 ms
64 bytes from 192.168.33.1: icmp_seq=7 ttl=64 time=0.171 ms

You could define DNS names for these IPs, such as master.ib and c01.ib like before.

Adding TCP/IP to IB does add latency to the connection, but you have the benefit of the very high bandwidth of IB. This is why you could use this IPoIB network for connections that require high bandwidth. E.g. for NFS storage traffic. Note that real parallel filesystems such as GPFS take advantage of RDMA and avoid the added latency of IPoIB.