k8s-rdma-device-plugin
is a device plugin for Kubernetes to manage RDMA device.
RDMA(remote direct memory access) is a high performance network protocol, which has the following major advantages:
Zero-copy
Applications can perform data transfer without the network software stack involvement and data is being send received directly to the buffers without being copied between the network layers.
Kernel bypass
Applications can perform data transfer directly from userspace without the need to perform context switches.
No CPU involvement
Applications can access remote memory without consuming any CPU in the remote machine. The remote memory machine will be read without any intervention of remote process (or processor). The caches in the remote CPU(s) won’t be filled with the accessed memory content.
You can read this post to get more information about RDMA
.
This plugin allow you to use RDMA device in container of Kubernetes cluster. And more, We can use this plugin work with sriov-cni to provide high perfmance network connection for distributed
application, especially GPU
distributed application, such as Tensorflow
,Spark
, etc.
上面是官方的介绍,大致对此有个了解。安装k8s-rdma-device-plugin
的目的是在K8S中使用这种高性能的通信网络。下面是具体的安装步骤:
本地InfiniBand
驱动安装
InfiniBand
称为无限宽带技术,简称IB。我们使用IB线将两台设备进行连接,然后进行驱动安装。
环境检测
查看本地是否安装了IB卡
root@m1:/# lspci |grep Mell
1a:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5]
如果没有返回任何信息,说明服务器没有安装IB卡,也无需接下来的配置。
依赖安装
apt-get install python-libxml2 gfortran libgfortran3 libnl-route-3-200 dpatch quilt bison swig \
debhelper automake libltdl-dev chrpath flex autoconf m4 autotools-dev graphviz lsb-core
如果在安装依赖中有任何问题的请及时解决,每台服务器情况不同,但一定要确保这些依赖安装成功。
安装驱动
root@m1:/# cd ./rdma-device-plugin
root@m1:/# tar zxvf MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64.tgz
root@m1:/# cd MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64
root@m1:/# ll
total 272
drwxr-xr-x 6 root root 4096 9月 26 2019 ./
drwxr-xr-x 15 root root 4096 10月 22 16:01 ../
-rw-r--r-- 1 root root 7 9月 26 2019 .arch
-rwxr-xr-x 1 root root 2605 9月 26 2019 common_installers.pl*
-rwxr-xr-x 1 root root 5956 9月 26 2019 common.pl*
-rwxr-xr-x 1 root root 24634 9月 26 2019 create_mlnx_ofed_installers.pl*
drwxr-xr-x 5 root root 4096 9月 26 2019 DEBS/
drwxr-xr-x 2 root root 4096 9月 26 2019 DEBS_UPSTREAM_LIBS/
-rw-r--r-- 1 root root 12 9月 26 2019 distro
drwxr-xr-x 8 root root 4096 9月 26 2019 docs/
-rw-r--r-- 1 root root 956 9月 26 2019 LICENSE
-rw-r--r-- 1 root root 12 9月 26 2019 .mlnx
-rwxr-xr-x 1 root root 27611 9月 26 2019 mlnx_add_kernel_support.sh*
-rwxr-xr-x 1 root root 151310 9月 26 2019 mlnxofedinstall*
-rw-r--r-- 1 root root 2764 9月 26 2019 RPM-GPG-KEY-Mellanox
drwxr-xr-x 2 root root 4096 9月 26 2019 src/
-rwxr-xr-x 1 root root 10894 9月 26 2019 uninstall.sh*
root@m1:/# ./mlnxofedinstall --force
...
按照剧本走的话应该是能成功安装的,但不幸的是可能会遇到各种问题,请自己百度解决。
重新加载驱动
root@m1:/# /etc/init.d/openibd restart
查看IB
root@m1:/# ibstat
CA 'mlx5_0'
CA type: MT4119
Number of ports: 1
Firmware version: 16.24.1000
Hardware version: 0
Node GUID: 0xb8599f03001212a0
System image GUID: 0xb8599f03001212a0
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x2651e84a
Port GUID: 0xb8599f03001212a0
Link layer: InfiniBand
出现此信息说明IB驱动安装成功。
测试连接性及性能
请按照同样的方式在另一台机器m2上进行IB驱动安装。
连接性
测连接性需要有一个服务端和一个客户端,此处我们把m1作为服务端,m2作为客户端
服务端
root@m1:/# ibping -S -C mlx5_0 -P 1 # 无任何输出
-S:服务端
-C:CA
-P:Port
客户端
root@m2:/# ibping -c 10000 -f -C mlx4_0 -P 1 -L 1
--- m1.(none) (Lid 1) ibping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 1410 ms
rtt min/avg/max = 0.038/0.140/3.774 ms
-c:发送10000个packet之后停止
-f:flood destination
-C:客户端的CA
-P:服务端的Port
-L:服务端的Base lid
性能
重启IB服务和子网管理器
root@m1:/# /etc/init.d/openibd restart
root@m1:/# /etc/init.d/opensmd restart
测试写带宽
第一台m1执行
root@m1:/# ib_write_bw
************************************
* Waiting for client to connect... *
************************************
第二台m2执行
root@m2:/# ib_write_bw m1_ip
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx4_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x02 QPN 0x021d PSN 0xaf91fe RKey 0x28010100 VAddr 0x007f4732586000
remote address: LID 0x01 QPN 0x0088 PSN 0xb7c60d RKey 0x009866 VAddr 0x007f60a41d9000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 999.994000 != 1549.358000. CPU Frequency is not max.
65536 5000 1708.48 1707.75 0.027324
---------------------------------------------------------------------------------------
此时,两台设备都会输出如上的信息。
同样的方法 读带宽和延迟的测试分别使用ib_read_bw
和ib_write_lat
/ib_read_lat
至此两台设备的IB驱动已经全部安装完毕,接下来进行device-plugin的安装。
安装rdma-device-plugin
root@m2:/# cd ./rdma-device-plugin
root@m2:/# docker load -i carmark_k8s_rdma_device_plugin.tar
root@m2:/# docker images|grep carmark
carmark/k8s-rdma-device-plugin latest 50c33cf119a4 2 years ago 1.31GB
root@m2:/# cd dockerfile
root@m2:/# docker build -t carmark/k8s-rdma-device-plugin:latest .
root@m2:/# cd ../
root@m2:/# kubectl -n kube-system apply -f rdma-device-plugin.yml
root@m2:/# kubectl -n kube-system get pods|grep rdma
rdma-device-plugin-daemonset-4bwlk 1/1 Running 0 15h
rdma-device-plugin-daemonset-hxqk7 1/1 Running 0 15h
查看rdma资源
root@m2:/# kubectl describe node
在此贴上rdma-device-plugin.yml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: rdma-device-plugin-daemonset
namespace: kube-system
spec:
template:
metadata:
# Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
# reserves resources for critical add-on pods so that they can be rescheduled after
# a failure. This annotation works in tandem with the toleration below.
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: rdma-device-plugin-ds
spec:
tolerations:
# Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
# This, along with the annotation above marks this pod as a critical add-on.
- key: CriticalAddonsOnly
operator: Exists
hostNetwork: true
containers:
- image: carmark/k8s-rdma-device-plugin:latest
imagePullPolicy: IfNotPresent
name: rdma-device-plugin-ctr
#args: ["-log-level", "debug"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: sys-class
mountPath: /sys/class
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: sys-class
hostPath:
path: /sys/class
至此,rdma-device-plugin
安装完成。
如下是编译k8s-rdma-device-plugin
的步骤,感兴趣的可以瞅瞅。
我们看一下Dockerfile
FROM carmark/k8s-rdma-device-plugin
COPY k8s-rdma-device-plugin /usr/local/bin/
ENTRYPOINT ["k8s-rdma-device-plugin"]
此处的k8s-rdma-device-plugin
可执行文件是经过Go编译而来,而从网上直接下载的代码编译是不成功的,需要进行小小的修改。
root@m1:/# cd rdma-device-plugin
root@m1:/# tar zxvf go1.15.3.linux-amd64.tar.gz -C /usr/local/
root@m1:/# vim ~/.bashrc
# 添加如下路径
export GOROOT=/usr/local/go
export GOPATH=/home/goProject
export PATH=$PATH:$GOROOT/bin
root@m1:/# source ~/.bashrc
root@m1:/usr/local/go# go version
go version go1.15.3 linux/amd64
k8s-rdma-device-plugin
root@m1:/# mkdir -p /home/goProject/src
root@m1:/# unzip -d /home/goProject/src/ k8s-rdma-device-plugin-master.zip
root@m1:/# cd /home/goProject/src/k8s-rdma-device-plugin
root@m1:/# ll
total 100
drwxr-xr-x 5 root root 4096 12月 31 2019 ./
drwxr-xr-x 13 root root 4096 10月 23 11:24 ../
-rwxr-xr-x 1 root root 378 12月 31 2019 build*
-rw-r--r-- 1 root root 118 12月 31 2019 Dockerfile
-rw-r--r-- 1 root root 507 12月 31 2019 .gitignore
-rw-r--r-- 1 root root 4134 12月 31 2019 Gopkg.lock
-rw-r--r-- 1 root root 927 12月 31 2019 Gopkg.toml
drwxr-xr-x 2 root root 4096 12月 31 2019 hack/
drwxr-xr-x 2 root root 4096 12月 31 2019 ibverbs/
-rw-r--r-- 1 root root 11358 12月 31 2019 LICENSE
-rw-r--r-- 1 root root 2228 12月 31 2019 main.go
-rw-r--r-- 1 root root 1304 12月 31 2019 rdma-device-plugin.yml
-rw-r--r-- 1 root root 3208 12月 31 2019 rdma.go
-rw-r--r-- 1 root root 4421 12月 31 2019 README.md
-rw-r--r-- 1 root root 6509 12月 31 2019 server.go
-rw-r--r-- 1 root root 2330 12月 31 2019 sriov.go
-rw-r--r-- 1 root root 240 12月 31 2019 .travis.yml
-rw-r--r-- 1 root root 169 12月 31 2019 types.go
drwxr-xr-x 6 root root 4096 12月 31 2019 vendor/
-rw-r--r-- 1 root root 500 12月 31 2019 watcher.go
修改build
#!/bin/sh
REPO_PATH="k8s-rdma-device-plugin"
export GO15VENDOREXPERIMENT=1
export GOBIN=${PWD}/bin
FMT="*.go"
echo "Checking gofmt..."
fmtRes=$(gofmt -l $FMT)
if [ -n "${fmtRes}" ]; then
echo -e "gofmt checking failed:\n${fmtRes}"
exit 255
fi
echo "Building plugins"
go install "$@" ${REPO_PATH}
更改go代码中的导入包github.com/hustcat
进行更改
如rdma.go
中
"github.com/hustcat/k8s-rdma-device-plugin/ibverbs"
//修改为
"k8s-rdma-device-plugin/ibverbs"
执行build
root@m1:/# ./build
root@m1:/# ls bin
k8s-rdma-device-plugin
然后执行
root@m1:/# bin/k8s-rdma-device-plugin
I1023 11:52:22.554006 86270 main.go:31] Fetching devices.
ibvDevList: [{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0}]
netDevList: [vethf7e27bc7 veth8ceb929c veth31b4a302 enp96s0f0 vethca24852d enp96s0f1 vetha55e1b96 veth71d39aad veth492f0bf9 vethaf32d3a6 veth5f06dcff veth0deb7cf6 vethbb1ed727 veth874fceaa vethbcc0a7e6 veth2fa745a9 veth60889727 vethb7416a73 vetha4154a1b vethfc2bd58b vethc16f6b00 vethf7716b90 veth81218fb6 veth084ab25a veth9f377e8d veth4cea3686 veth2c2cff6c vetha72f5da2 vethfbb5aafd vethf6336b7b veth87f1624f veth8fdc4f8a veth3171c3c4 veth6c474d5f vethc132f493 veth605e82fe veth08aa8528 veth2f65d6b0 veth2b9b279f vethfaea8c1e veth4358a077 veth47ee05e3 vethdb1f63a9 veth699abb19 veth75d06790 veth89cc49c0 veth524565dc veth76dfa640 veth96dfd1b0 veth60a3a19f vethe36ff75e veth1b9fb905 vethff533970 veth39d46ea3 veth1505fe28 vethc85e7e03 veth3df6fbda vethfc30a2e7 veth7b8563e2 veth4f87fa9b ib0 veth661b8ccc vethace37698 veth0e581eb6 veth5ddaf13a veth60873598 veth5adab830 veth05a04167 vethbaceff83 vethd995d93e flannel.1 cni0 vethec045ed veth95c1ff1c vethc08ae971 vethd275ef73 veth5e91879e veth321d140c veth399324b6 vetheb6c5e27 vethb141865a veth56fc65ae veth164f0728]
I1023 11:52:22.572912 86270 main.go:43] RDMA device list: [{{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0} ib0 1}]
I1023 11:52:22.572950 86270 main.go:44] Starting FS watcher.
I1023 11:52:22.572997 86270 main.go:52] Starting OS watcher.
ibvDevList: [{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0}]
netDevList: [vethf7e27bc7 veth8ceb929c veth31b4a302 enp96s0f0 vethca24852d enp96s0f1 vetha55e1b96 veth71d39aad veth492f0bf9 vethaf32d3a6 veth5f06dcff veth0deb7cf6 vethbb1ed727 veth874fceaa vethbcc0a7e6 veth2fa745a9 veth60889727 vethb7416a73 vetha4154a1b vethfc2bd58b vethc16f6b00 vethf7716b90 veth81218fb6 veth084ab25a veth9f377e8d veth4cea3686 veth2c2cff6c vetha72f5da2 vethfbb5aafd vethf6336b7b veth87f1624f veth8fdc4f8a veth3171c3c4 veth6c474d5f vethc132f493 veth605e82fe veth08aa8528 veth2f65d6b0 veth2b9b279f vethfaea8c1e veth4358a077 veth47ee05e3 vethdb1f63a9 veth699abb19 veth75d06790 veth89cc49c0 veth524565dc veth76dfa640 veth96dfd1b0 veth60a3a19f vethe36ff75e veth1b9fb905 vethff533970 veth39d46ea3 veth1505fe28 vethc85e7e03 veth3df6fbda vethfc30a2e7 veth7b8563e2 veth4f87fa9b ib0 veth661b8ccc vethace37698 veth0e581eb6 veth5ddaf13a veth60873598 veth5adab830 veth05a04167 vethbaceff83 vethd995d93e flannel.1 cni0 vethec045ed veth95c1ff1c vethc08ae971 vethd275ef73 veth5e91879e veth321d140c veth399324b6 vetheb6c5e27 vethb141865a veth56fc65ae veth164f0728]
I1023 11:52:22.597377 86270 server.go:258] Starting to serve on /var/lib/kubelet/device-plugins/rdma.sock
I1023 11:52:22.599371 86270 server.go:266] Registered device plugin with Kubelet