• K8s使用RDMA进行高速通信


    RDMA device-plugin for Kubernetes

    Introduction

    k8s-rdma-device-plugin is a device plugin for Kubernetes to manage RDMA device.

    RDMA(remote direct memory access) is a high performance network protocol, which has the following major advantages:

    • Zero-copy

      Applications can perform data transfer without the network software stack involvement and data is being send received directly to the buffers without being copied between the network layers.

    • Kernel bypass

      Applications can perform data transfer directly from userspace without the need to perform context switches.

    • No CPU involvement

      Applications can access remote memory without consuming any CPU in the remote machine. The remote memory machine will be read without any intervention of remote process (or processor). The caches in the remote CPU(s) won’t be filled with the accessed memory content.

    You can read this post to get more information about RDMA.

    This plugin allow you to use RDMA device in container of Kubernetes cluster. And more, We can use this plugin work with sriov-cni to provide high perfmance network connection for distributed application, especially GPU distributed application, such as Tensorflow,Spark, etc.


    安装步骤

    上面是官方的介绍,大致对此有个了解。安装k8s-rdma-device-plugin的目的是在K8S中使用这种高性能的通信网络。下面是具体的安装步骤:

    1. 本地InfiniBand驱动安装

      InfiniBand称为无限宽带技术,简称IB。我们使用IB线将两台设备进行连接,然后进行驱动安装。

      • 环境检测

        查看本地是否安装了IB卡

        root@m1:/# lspci |grep Mell
        1a:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5]
        
        • 1
        • 2

        如果没有返回任何信息,说明服务器没有安装IB卡,也无需接下来的配置。

      • 依赖安装

        apt-get install python-libxml2 gfortran libgfortran3 libnl-route-3-200 dpatch quilt bison swig \
        debhelper automake libltdl-dev chrpath flex autoconf m4 autotools-dev graphviz lsb-core
        
        • 1
        • 2

        如果在安装依赖中有任何问题的请及时解决,每台服务器情况不同,但一定要确保这些依赖安装成功。

      • 安装驱动

        root@m1:/# cd ./rdma-device-plugin
        root@m1:/# tar zxvf MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64.tgz
        root@m1:/# cd MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu16.04-x86_64
        root@m1:/# ll
        total 272
        drwxr-xr-x  6 root root   4096 926  2019 ./
        drwxr-xr-x 15 root root   4096 1022 16:01 ../
        -rw-r--r--  1 root root      7 926  2019 .arch
        -rwxr-xr-x  1 root root   2605 926  2019 common_installers.pl*
        -rwxr-xr-x  1 root root   5956 926  2019 common.pl*
        -rwxr-xr-x  1 root root  24634 926  2019 create_mlnx_ofed_installers.pl*
        drwxr-xr-x  5 root root   4096 926  2019 DEBS/
        drwxr-xr-x  2 root root   4096 926  2019 DEBS_UPSTREAM_LIBS/
        -rw-r--r--  1 root root     12 926  2019 distro
        drwxr-xr-x  8 root root   4096 926  2019 docs/
        -rw-r--r--  1 root root    956 926  2019 LICENSE
        -rw-r--r--  1 root root     12 926  2019 .mlnx
        -rwxr-xr-x  1 root root  27611 926  2019 mlnx_add_kernel_support.sh*
        -rwxr-xr-x  1 root root 151310 926  2019 mlnxofedinstall*
        -rw-r--r--  1 root root   2764 926  2019 RPM-GPG-KEY-Mellanox
        drwxr-xr-x  2 root root   4096 926  2019 src/
        -rwxr-xr-x  1 root root  10894 926  2019 uninstall.sh*
        root@m1:/# ./mlnxofedinstall --force
        ...
        
        • 1
        • 2
        • 3
        • 4
        • 5
        • 6
        • 7
        • 8
        • 9
        • 10
        • 11
        • 12
        • 13
        • 14
        • 15
        • 16
        • 17
        • 18
        • 19
        • 20
        • 21
        • 22
        • 23
        • 24

        按照剧本走的话应该是能成功安装的,但不幸的是可能会遇到各种问题,请自己百度解决。

      • 重新加载驱动

        root@m1:/# /etc/init.d/openibd restart
        
        • 1
      • 查看IB

        root@m1:/# ibstat
        CA 'mlx5_0'
        	CA type: MT4119
        	Number of ports: 1
        	Firmware version: 16.24.1000
        	Hardware version: 0
        	Node GUID: 0xb8599f03001212a0
        	System image GUID: 0xb8599f03001212a0
        	Port 1:
        		State: Active
        		Physical state: LinkUp
        		Rate: 56
        		Base lid: 1
        		LMC: 0
        		SM lid: 1
        		Capability mask: 0x2651e84a
        		Port GUID: 0xb8599f03001212a0
        		Link layer: InfiniBand
        
        • 1
        • 2
        • 3
        • 4
        • 5
        • 6
        • 7
        • 8
        • 9
        • 10
        • 11
        • 12
        • 13
        • 14
        • 15
        • 16
        • 17
        • 18

        出现此信息说明IB驱动安装成功。

    2. 测试连接性及性能

      请按照同样的方式在另一台机器m2上进行IB驱动安装。

      • 连接性

        测连接性需要有一个服务端和一个客户端,此处我们把m1作为服务端,m2作为客户端

        • 服务端

          root@m1:/# ibping -S -C mlx5_0 -P 1 # 无任何输出
          
          • 1

          -S:服务端

          -C:CA

          -P:Port

        • 客户端

          root@m2:/# ibping -c 10000 -f -C mlx4_0 -P 1 -L 1
          
          --- m1.(none) (Lid 1) ibping statistics ---
          10000 packets transmitted, 10000 received, 0% packet loss, time 1410 ms
          rtt min/avg/max = 0.038/0.140/3.774 ms
          
          • 1
          • 2
          • 3
          • 4
          • 5

          -c:发送10000个packet之后停止

          -f:flood destination

          -C:客户端的CA

          -P:服务端的Port

          -L:服务端的Base lid

      • 性能

        重启IB服务和子网管理器

        root@m1:/# /etc/init.d/openibd restart
        root@m1:/# /etc/init.d/opensmd restart
        
        • 1
        • 2

        测试写带宽

        第一台m1执行

        root@m1:/# ib_write_bw
        
        ************************************
        * Waiting for client to connect... *
        ************************************
        
        • 1
        • 2
        • 3
        • 4
        • 5

        第二台m2执行

        root@m2:/# ib_write_bw m1_ip
        ---------------------------------------------------------------------------------------
                            RDMA_Write BW Test
         Dual-port       : OFF		Device         : mlx4_0
         Number of qps   : 1		Transport type : IB
         Connection type : RC		Using SRQ      : OFF
         TX depth        : 128
         CQ Moderation   : 100
         Mtu             : 2048[B]
         Link type       : IB
         Max inline data : 0[B]
         rdma_cm QPs	 : OFF
         Data ex. method : Ethernet
        ---------------------------------------------------------------------------------------
         local address: LID 0x02 QPN 0x021d PSN 0xaf91fe RKey 0x28010100 VAddr 0x007f4732586000
         remote address: LID 0x01 QPN 0x0088 PSN 0xb7c60d RKey 0x009866 VAddr 0x007f60a41d9000
        ---------------------------------------------------------------------------------------
         #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
        Conflicting CPU frequency values detected: 999.994000 != 1549.358000. CPU Frequency is not max.
         65536      5000             1708.48            1707.75		   0.027324
        ---------------------------------------------------------------------------------------
        
        • 1
        • 2
        • 3
        • 4
        • 5
        • 6
        • 7
        • 8
        • 9
        • 10
        • 11
        • 12
        • 13
        • 14
        • 15
        • 16
        • 17
        • 18
        • 19
        • 20
        • 21

        此时,两台设备都会输出如上的信息。

        同样的方法 读带宽和延迟的测试分别使用ib_read_bwib_write_lat/ib_read_lat

        至此两台设备的IB驱动已经全部安装完毕,接下来进行device-plugin的安装。

    3. 安装rdma-device-plugin

      root@m2:/# cd ./rdma-device-plugin
      root@m2:/# docker load -i carmark_k8s_rdma_device_plugin.tar
      root@m2:/# docker images|grep carmark
      carmark/k8s-rdma-device-plugin   latest   50c33cf119a4    2 years ago      1.31GB
      root@m2:/# cd dockerfile
      root@m2:/# docker build -t carmark/k8s-rdma-device-plugin:latest .
      root@m2:/# cd ../
      root@m2:/# kubectl -n kube-system apply -f rdma-device-plugin.yml
      root@m2:/# kubectl -n kube-system get pods|grep rdma
      rdma-device-plugin-daemonset-4bwlk      1/1     Running   0          15h
      rdma-device-plugin-daemonset-hxqk7      1/1     Running   0          15h
      
      • 1
      • 2
      • 3
      • 4
      • 5
      • 6
      • 7
      • 8
      • 9
      • 10
      • 11

      查看rdma资源

      root@m2:/# kubectl describe node
      
      • 1

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9FAV59Ve-1656220433534)(D:/BaiduNetdiskDownload/learningPath/rdma-device-plugin/img/describe_node.png)]

    在此贴上rdma-device-plugin.yml

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: rdma-device-plugin-daemonset
      namespace: kube-system
    spec:
      template:
        metadata:
          # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
          # reserves resources for critical add-on pods so that they can be rescheduled after
          # a failure.  This annotation works in tandem with the toleration below.
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ""
          labels:
            name: rdma-device-plugin-ds
        spec:
          tolerations:
          # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
          # This, along with the annotation above marks this pod as a critical add-on.
          - key: CriticalAddonsOnly
            operator: Exists
          hostNetwork: true
          containers:
          - image: carmark/k8s-rdma-device-plugin:latest
            imagePullPolicy: IfNotPresent
            name: rdma-device-plugin-ctr
            #args: ["-log-level", "debug"]
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop: ["ALL"]
            volumeMounts:
              - name: device-plugin
                mountPath: /var/lib/kubelet/device-plugins
              - name: sys-class
                mountPath: /sys/class  
          volumes:
            - name: device-plugin
              hostPath:
                path: /var/lib/kubelet/device-plugins
            - name: sys-class
              hostPath:
                path: /sys/class
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    至此,rdma-device-plugin安装完成。

    如下是编译k8s-rdma-device-plugin的步骤,感兴趣的可以瞅瞅。


    走过的坎坷路

    我们看一下Dockerfile

    FROM carmark/k8s-rdma-device-plugin 
    
    COPY k8s-rdma-device-plugin /usr/local/bin/
    
    ENTRYPOINT ["k8s-rdma-device-plugin"]
    
    • 1
    • 2
    • 3
    • 4
    • 5

    此处的k8s-rdma-device-plugin可执行文件是经过Go编译而来,而从网上直接下载的代码编译是不成功的,需要进行小小的修改。

    1. 首先在服务器安装Go环境,已安装的跳过
    root@m1:/# cd rdma-device-plugin
    root@m1:/# tar zxvf go1.15.3.linux-amd64.tar.gz -C /usr/local/
    root@m1:/# vim ~/.bashrc 
    # 添加如下路径
    export GOROOT=/usr/local/go
    export GOPATH=/home/goProject
    export PATH=$PATH:$GOROOT/bin
    root@m1:/# source ~/.bashrc
    root@m1:/usr/local/go# go version
    go version go1.15.3 linux/amd64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    1. 编译k8s-rdma-device-plugin
    root@m1:/# mkdir -p /home/goProject/src
    root@m1:/# unzip -d /home/goProject/src/ k8s-rdma-device-plugin-master.zip
    root@m1:/# cd /home/goProject/src/k8s-rdma-device-plugin
    root@m1:/# ll
    total 100
    drwxr-xr-x  5 root root  4096 1231  2019 ./
    drwxr-xr-x 13 root root  4096 1023 11:24 ../
    -rwxr-xr-x  1 root root   378 1231  2019 build*
    -rw-r--r--  1 root root   118 1231  2019 Dockerfile
    -rw-r--r--  1 root root   507 1231  2019 .gitignore
    -rw-r--r--  1 root root  4134 1231  2019 Gopkg.lock
    -rw-r--r--  1 root root   927 1231  2019 Gopkg.toml
    drwxr-xr-x  2 root root  4096 1231  2019 hack/
    drwxr-xr-x  2 root root  4096 1231  2019 ibverbs/
    -rw-r--r--  1 root root 11358 1231  2019 LICENSE
    -rw-r--r--  1 root root  2228 1231  2019 main.go
    -rw-r--r--  1 root root  1304 1231  2019 rdma-device-plugin.yml
    -rw-r--r--  1 root root  3208 1231  2019 rdma.go
    -rw-r--r--  1 root root  4421 1231  2019 README.md
    -rw-r--r--  1 root root  6509 1231  2019 server.go
    -rw-r--r--  1 root root  2330 1231  2019 sriov.go
    -rw-r--r--  1 root root   240 1231  2019 .travis.yml
    -rw-r--r--  1 root root   169 1231  2019 types.go
    drwxr-xr-x  6 root root  4096 1231  2019 vendor/
    -rw-r--r--  1 root root   500 1231  2019 watcher.go
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    修改build

    #!/bin/sh
    REPO_PATH="k8s-rdma-device-plugin"
    
    export GO15VENDOREXPERIMENT=1
    export GOBIN=${PWD}/bin
    
    FMT="*.go"
    echo "Checking gofmt..."
    fmtRes=$(gofmt -l $FMT)
    if [ -n "${fmtRes}" ]; then
        echo -e "gofmt checking failed:\n${fmtRes}"
        exit 255
    fi
    
    echo "Building plugins"
    go install "$@" ${REPO_PATH}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    更改go代码中的导入包github.com/hustcat进行更改

    rdma.go

    "github.com/hustcat/k8s-rdma-device-plugin/ibverbs"
    //修改为
    "k8s-rdma-device-plugin/ibverbs"
    
    • 1
    • 2
    • 3

    执行build

    root@m1:/# ./build
    root@m1:/# ls bin
    k8s-rdma-device-plugin
    
    • 1
    • 2
    • 3

    然后执行

    root@m1:/# bin/k8s-rdma-device-plugin
    I1023 11:52:22.554006   86270 main.go:31] Fetching devices.
    ibvDevList: [{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0}]
    netDevList: [vethf7e27bc7 veth8ceb929c veth31b4a302 enp96s0f0 vethca24852d enp96s0f1 vetha55e1b96 veth71d39aad veth492f0bf9 vethaf32d3a6 veth5f06dcff veth0deb7cf6 vethbb1ed727 veth874fceaa vethbcc0a7e6 veth2fa745a9 veth60889727 vethb7416a73 vetha4154a1b vethfc2bd58b vethc16f6b00 vethf7716b90 veth81218fb6 veth084ab25a veth9f377e8d veth4cea3686 veth2c2cff6c vetha72f5da2 vethfbb5aafd vethf6336b7b veth87f1624f veth8fdc4f8a veth3171c3c4 veth6c474d5f vethc132f493 veth605e82fe veth08aa8528 veth2f65d6b0 veth2b9b279f vethfaea8c1e veth4358a077 veth47ee05e3 vethdb1f63a9 veth699abb19 veth75d06790 veth89cc49c0 veth524565dc veth76dfa640 veth96dfd1b0 veth60a3a19f vethe36ff75e veth1b9fb905 vethff533970 veth39d46ea3 veth1505fe28 vethc85e7e03 veth3df6fbda vethfc30a2e7 veth7b8563e2 veth4f87fa9b ib0 veth661b8ccc vethace37698 veth0e581eb6 veth5ddaf13a veth60873598 veth5adab830 veth05a04167 vethbaceff83 vethd995d93e flannel.1 cni0 vethec045ed veth95c1ff1c vethc08ae971 vethd275ef73 veth5e91879e veth321d140c veth399324b6 vetheb6c5e27 vethb141865a veth56fc65ae veth164f0728]
    I1023 11:52:22.572912   86270 main.go:43] RDMA device list: [{{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0} ib0 1}]
    I1023 11:52:22.572950   86270 main.go:44] Starting FS watcher.
    I1023 11:52:22.572997   86270 main.go:52] Starting OS watcher.
    ibvDevList: [{mlx5_0 uverbs0 /sys/class/infiniband_verbs/uverbs0 /sys/class/infiniband/mlx5_0}]
    netDevList: [vethf7e27bc7 veth8ceb929c veth31b4a302 enp96s0f0 vethca24852d enp96s0f1 vetha55e1b96 veth71d39aad veth492f0bf9 vethaf32d3a6 veth5f06dcff veth0deb7cf6 vethbb1ed727 veth874fceaa vethbcc0a7e6 veth2fa745a9 veth60889727 vethb7416a73 vetha4154a1b vethfc2bd58b vethc16f6b00 vethf7716b90 veth81218fb6 veth084ab25a veth9f377e8d veth4cea3686 veth2c2cff6c vetha72f5da2 vethfbb5aafd vethf6336b7b veth87f1624f veth8fdc4f8a veth3171c3c4 veth6c474d5f vethc132f493 veth605e82fe veth08aa8528 veth2f65d6b0 veth2b9b279f vethfaea8c1e veth4358a077 veth47ee05e3 vethdb1f63a9 veth699abb19 veth75d06790 veth89cc49c0 veth524565dc veth76dfa640 veth96dfd1b0 veth60a3a19f vethe36ff75e veth1b9fb905 vethff533970 veth39d46ea3 veth1505fe28 vethc85e7e03 veth3df6fbda vethfc30a2e7 veth7b8563e2 veth4f87fa9b ib0 veth661b8ccc vethace37698 veth0e581eb6 veth5ddaf13a veth60873598 veth5adab830 veth05a04167 vethbaceff83 vethd995d93e flannel.1 cni0 vethec045ed veth95c1ff1c vethc08ae971 vethd275ef73 veth5e91879e veth321d140c veth399324b6 vetheb6c5e27 vethb141865a veth56fc65ae veth164f0728]
    I1023 11:52:22.597377   86270 server.go:258] Starting to serve on /var/lib/kubelet/device-plugins/rdma.sock
    I1023 11:52:22.599371   86270 server.go:266] Registered device plugin with Kubelet
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    Over

  • 相关阅读:
    Linux 驱动开发 六十六:多点触控(MT)协议
    已解决ModuleNotFoundError: No module named‘ pip‘(重新安装pip的两种方式)
    java-jdbc快速入门
    docker-compose
    Spring Boot 系列四:Springboot 启动原理和微服务主流框架
    01背包&完全背包学习记录
    flink 事件处理 CEP
    STM32G0 USB DFU 升级校验出错-2
    零命令使用git+TortoiseGit工具拉取推送到gitee网站
    ARCGIS 横向图例是怎么做的?
  • 原文地址:https://blog.csdn.net/z13653662052/article/details/125468894