8. IP over InfiniBand (IPoIB)
One last topic that we want to mention is that IB can also be used for running
an IP stack on top of it. Think of IB as another datalink layer, where we just
replace Ethernet with IB. After enabling IB support on your system, you will
notice an additional network device called ib0
. Each IB port will show up
as such a device. Similar to other network devices, you can assign an IP
address to it.
[root@master ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:21:9b:9f:7c:df brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:21:9b:9f:7c:e1 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.80/24 brd 172.16.1.255 scope global noprefixroute dynamic em2
valid_lft 18660sec preferred_lft 18660sec
inet6 fe80::221:9bff:fe9f:7ce1/64 scope link
valid_lft forever preferred_lft forever
4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:21:9b:9f:7c:e3 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.1/20 brd 192.168.15.255 scope global noprefixroute em3
valid_lft forever preferred_lft forever
inet6 fe80::221:9bff:fe9f:7ce3/64 scope link
valid_lft forever preferred_lft forever
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:21:9b:9f:7c:e5 brd ff:ff:ff:ff:ff:ff
inet 192.168.16.1/20 brd 192.168.31.255 scope global noprefixroute em4
valid_lft forever preferred_lft forever
inet6 fe80::221:9bff:fe9f:7ce5/64 scope link
valid_lft forever preferred_lft forever
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0b:86:f9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 192.168.32.1/20 brd 192.168.47.255 scope global noprefixroute ib0
valid_lft forever preferred_lft forever
If you want to try this out, apply a static IP on the ib0
interface. E.g.
192.168.32.1/20
on master and 192.168.33.1-4
on your compute nodes.
Note
Since all clusters share the same data-link, you will have to select a different subnet for each cluster to avoid duplicate IPs.
Group |
IPoIB Network |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Once that is done, you can even ping your systems:
[root@master ~]# ping 192.168.33.1
PING 192.168.33.1 (192.168.33.1) 56(84) bytes of data.
64 bytes from 192.168.33.1: icmp_seq=1 ttl=64 time=3.42 ms
64 bytes from 192.168.33.1: icmp_seq=2 ttl=64 time=0.174 ms
64 bytes from 192.168.33.1: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.33.1: icmp_seq=4 ttl=64 time=0.172 ms
64 bytes from 192.168.33.1: icmp_seq=5 ttl=64 time=0.166 ms
64 bytes from 192.168.33.1: icmp_seq=6 ttl=64 time=0.167 ms
64 bytes from 192.168.33.1: icmp_seq=7 ttl=64 time=0.171 ms
You could define DNS names for these IPs, such as master.ib
and c01.ib
like before.
Adding TCP/IP to IB does add latency to the connection, but you have the benefit of the very high bandwidth of IB. This is why you could use this IPoIB network for connections that require high bandwidth. E.g. for NFS storage traffic. Note that real parallel filesystems such as GPFS take advantage of RDMA and avoid the added latency of IPoIB.