• 问题记录 MLNX_OFED_LINUX-5.4-3.6.8.1-ubuntu20.04-x86_64安装mlnx-ofed-kernel-dkms错误


    root@zju-PowerEdge-R540:/home/zju/Downloads/MLNX_OFED_LINUX-5.4-3.6.8.1-ubuntu20.04-x86_64# ./mlnxofedinstall --upstream-libs --dpdk
    Logs dir: /tmp/MLNX_OFED_LINUX.1892101.logs
    General log file: /tmp/MLNX_OFED_LINUX.1892101.logs/general.log

    Below is the list of MLNX_OFED_LINUX packages that you have chosen
    (some may have been added by the installer due to package dependencies):

    ofed-scripts
    mstflint
    mlnx-tools
    mlnx-ofed-kernel-utils
    mlnx-ofed-kernel-dkms
    rdma-core
    libibverbs1
    ibverbs-utils
    ibverbs-providers
    libibverbs-dev
    librdmacm1
    rdmacm-utils
    librdmacm-dev
    libibumad3
    ibacm
    python3-pyverbs

    This program will install the MLNX_OFED_LINUX package on your machine.
    Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
    Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

    Do you want to continue? [Y/n] y
    Setting up mlnx-ofed-kernel-dkms (5.4-OFED.5.4.3.6.8.1) …
    Removing old mlnx-ofed-kernel-5.4 DKMS files…


    Deleting module version: 5.4
    completely from the DKMS tree.

    Done.
    Loading new mlnx-ofed-kernel-5.4 DKMS files…
    First Installation: checking all kernels…
    Building only for 5.15.0-83-generic
    Building for architecture x86_64
    Building initial module for 5.15.0-83-generic

    Reading package lists… Done
    Building dependency tree
    Reading state information… Done
    mlnx-ofed-kernel-dkms is already the newest version (5.4-OFED.5.4.3.6.8.1).
    0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.
    1 not fully installed or removed.
    After this operation, 0 B of additional disk space will be used.
    Do you want to continue? [Y/n] y
    Setting up mlnx-ofed-kernel-dkms (5.4-OFED.5.4.3.6.8.1) …
    Removing old mlnx-ofed-kernel-5.4 DKMS files…


    Deleting module version: 5.4
    completely from the DKMS tree.

    Done.
    Loading new mlnx-ofed-kernel-5.4 DKMS files…
    First Installation: checking all kernels…
    Building only for 5.15.0-83-generic
    Building for architecture x86_64
    Building initial module for 5.15.0-83-generic
    Error! Bad return status for module build on kernel: 5.15.0-83-generic (x86_64)
    Consult /var/lib/dkms/mlnx-ofed-kernel/5.4/build/make.log for more information.
    dpkg: error processing package mlnx-ofed-kernel-dkms (–configure):
    installed mlnx-ofed-kernel-dkms package post-installation script subprocess returned error exit status 10
    Errors were encountered while processing:
    mlnx-ofed-kernel-dkms
    E: Sub-process /usr/bin/dpkg returned an error code (1)
    W: Operation was interrupted before it could finish

    同时近期该机器出现了某个存储卷失效的问题,不知道是何原因

    本来想安装rdma的组件,参考https://blog.csdn.net/ibless/article/details/121663751 ,
    输入了sudo ./mlnxofedinstall --add-kernel-support这一步之后报错
    之后发现dpdk的收包功能失效,出现了以下错误
    EAL: Detected 32 lcore(s)
    EAL: Detected 2 NUMA nodes
    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    EAL: Selected IOVA mode ‘PA’
    EAL: No free hugepages reported in hugepages-1048576kB
    EAL: No free hugepages reported in hugepages-1048576kB
    EAL: No available hugepages reported in hugepages-1048576kB
    EAL: Probing VFIO support…
    EAL: VFIO support initialized
    EAL: using IOMMU type 8 (No-IOMMU)
    EAL: Probe PCI driver: net_ice (8086:1592) device: 0000:af:00.0 (socket 1)
    ice_load_pkg_type(): Active package is: 1.3.30.0, ICE OS Default Package
    ice_init_proto_xtr(): Protocol extraction is not supported
    EAL: Probe PCI driver: net_ice (8086:1592) device: 0000:af:00.1 (socket 1)
    ice_load_pkg_type(): Active package is: 1.3.30.0, ICE OS Default Package
    ice_init_proto_xtr(): Protocol extraction is not supported
    EAL: No legacy callbacks, legacy socket not created
    EAL: Error - exiting with code: 1

    之后尝试按照该方法安装MLNX就出错了

    MLNX版本下载,直接wget
    
    https://content.mellanox.com/ofed/MLNX_OFED-5.4-3.6.8.1/MLNX_OFED_LINUX-5.4-3.6.8.1-ubuntu20.04-x86_64.tgz
    
    然后解压缩,安装
    
    /home/zju/Downloads/MLNX_OFED_LINUX-5.4-3.6.8.1-ubuntu20.04-x86_64
    
    tar -xvzf
    
    sudo ./mlnxofedinstall --dpdk
    
    然后按照他输出的提示restart
    
    /etc/init.d/openibd restart
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    该问题待解决,可能可以参考https://forums.developer.nvidia.com/t/failed-to-install-mlnx-ofed-kernel-dkms-deb-with-version-4-9-4-1-7-0/205889
    或者这个
    https://forums.developer.nvidia.com/t/i-failed-to-build-mlnx-ofed-linux-for-5-4-0-70-generic/206147

    更新,安装5.8版本已解决:

    去NVIDA官网看一下,选择另一个合适的版本
    https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
    我是ubuntu20.04系统,所以
    下载wget https://content.mellanox.com/ofed/MLNX_OFED-5.8-3.0.7.0/MLNX_OFED_LINUX-5.8-3.0.7.0-ubuntu20.04-x86_64.tgz,并解压
    5.8新版本的驱动
    然后对老版本5.4目录下运行在这里插入图片描述

    然后在5.8目录下运行
    ./mlnxofedinstall --add-kernel-support
    在这里插入图片描述

    发现失败
    然后再运行 ./mlnxofedinstall --dpdk
    在这里插入图片描述
    在这里插入图片描述

    成功
    然后运行/etc/init.d/openibd restart
    失败,原因如下:
    在这里插入图片描述

    rmmod: ERROR: Module ib_uverbs is in use by: irdma

    意思就是,内核模块 ib_uverbs被irama使用,所以可以参考
    https://blog.csdn.net/qq_32949893/article/details/108402550

    运行 sudo modprobe -r irdma ib_uverbs,然后再运行/etc/init.d/openibd restart,就可以了,如图所示

    在这里插入图片描述
    安装完成

  • 相关阅读:
    如何将本地jar包安装到maven仓库
    ZCMU--5115: Buying Keys(C语言)
    网站服务器怎么部署
    详解Transformer中的Encoder
    深入了解- TCP拥塞状态机 tcp_fastretrans_alert
    2022年,下半年互联网最靠谱的搞钱方法?
    一个简单但是能上分的特征标准化方法
    【linux】VirtualBox启动虚拟机报错
    自学黑客(网络安全)技术——2024最新
    Linux:进度条(小程序)以及git三板斧
  • 原文地址:https://blog.csdn.net/weixin_45485072/article/details/132892799