• Ubuntu下安装NCCL【运行百度的paddle多卡训练需要依赖NCCL】


    运行百度的paddle多卡训练需要依赖nccl,所以需要安装nccl,本文提供压缩包的nccl安装方式,亲测可用

    Network Installer for Ubuntu20.04
    
    $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
    $ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    $ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
    $ sudo apt-get update
    Network Installer for Ubuntu18.04
    
    $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    $ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    $ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
    $ sudo apt-get update
    Network Installer for RedHat/CentOS 8
    
    $ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
    Network Installer for RedHat/CentOS 7
    
    $ sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    then run the following command to installer NCCL:
    For Ubuntu: sudo apt install libnccl2=2.8.4-1+cuda11.1 libnccl-dev=2.8.4-1+cuda11.1
    For RHEL/Centos: sudo yum install libnccl-2.8.4-1+cuda11.1 libnccl-devel-2.8.4-1+cuda11.1 libnccl-static-2.8.4-1+cuda11.1

    在这里插入图片描述

    Install the repository.

    • For a local NCCL repository:

      sudo dpkg -i nccl-repo-<version>.deb
      
      • 1

      Note:
      The local repository installation will prompt you to install the local key it embeds and with which packages are signed. Make sure to follow the instructions to install the local key, or the install phase will fail later.

    • For the network repository:

      wget https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/cuda-keyring_1.0-1_all.deb
      sudo dpkg -i cuda-keyring_1.0-1_all.deb
      
      • 1
      • 2



    参考资料:
    安装nccl教程
    Ubuntu16.04安装NCCL
    https://docs.nvidia.com/deeplearning/nccl/install-guide/#debian
    NCCL所有版本

  • 相关阅读:
    tar、gzip、zip、jar是什么,怎么查看?
    数据库.创建表
    d隐式枚举猜
    《架构风清扬-Java面试系列第25讲》聊聊ArrayBlockingQueue的特点及使用场景
    Nmap 端口扫描
    python中的集合详解
    技术美术百人计划--(3)图形矩阵运算学习笔记
    【云原生】springcloud11——Hystrix是怎样让微服务“易凡峰顺”的
    C++入门之引用(超详解)
    Java 学习笔记
  • 原文地址:https://blog.csdn.net/u013250861/article/details/125559424