• Ubuntu 22.04LTS + 深度学习环境安装全流程


    一、 CUDA Toolkit 安装

    1. 选择需要安装的版本(下载地址)

    在这里插入图片描述

    2. 选择自己的系统版本获取下载地址和安装指令

    3. 运行安装指令进行安装

    wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
    sudo sh cuda_12.2.2_535.104.05_linux.run
    
    • 1
    • 2

    选择安装选项:
    在这里插入图片描述
    这里安装显卡驱动比较麻烦,所以这一步按空格或者Enter键去掉安装显卡驱动的选项,然后选择install
    在这里插入图片描述

    安装正常:
    在这里插入图片描述

    配置环境变量:

    vim ~/.bashrc
    
    # 添加一行
    export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
    # 保存并加载环境变量
    source ~/.bashrc
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    验证安装是否成功:

    root@service2:~# nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2023 NVIDIA Corporation
    Built on Tue_Aug_15_22:02:13_PDT_2023
    Cuda compilation tools, release 12.2, V12.2.140
    Build cuda_12.2.r12.2/compiler.33191640_0
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    4. 可能出现测错误

    gcc未安装或版本不对:

    报错 :

    	cat /var/log/cuda-installer.log
    	
    	[INFO]: Driver not installed.
    	[INFO]: Checking compiler version...
    	[INFO]: gcc location: G 
    	[ERROR]: Missing gcc. gcc is required to continue.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    解决方法 :

    	# 确保 GCC 已经安装:
    	gcc --version
    	
    	# 如果输出显示 GCC 版本信息,则说明已经安装。
    	
    	# 如果没有安装,请执行以下命令安装 GCC:
    	sudo apt update
    	sudo apt install gcc
    	
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    驱动不兼容

    报错:

    cat /var/log/nvidia-installer.log
    ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
    -> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory.  Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
    -> One or more modprobe configuration files to disable Nouveau have been written.  For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk.  Please reboot your system and attempt NVIDIA driver installation again.  Note if you later wish to re-enable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
    ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
    
    • 1
    • 2
    • 3
    • 4
    • 5

    解决方式

    
    echo "blacklist nouveau" >> /lib/modprobe.d/dist-blacklist.conf
    echo "options nouveau modeset=0" >> /lib/modprobe.d/dist-blacklist.conf
     
    mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
    dracut /boot/initramfs-$(uname -r).img $(uname -r) 
    systemctl set-default multi-user.target 
    reboot
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    1.2 显卡驱动安装(需要重启)

    自动安装显卡驱动:

    sudo ubuntu-drivers autoinstall
    
    
    • 1
    • 2

    检验驱动是否安装成功: nvidia-smi
    在这里插入图片描述
    如果有下面报错:

    root@service5:/home/voke# sudo ubuntu-drivers autoinstall
    Traceback (most recent call last):
      File "/usr/bin/ubuntu-drivers", line 513, in <module>
        greet()
      File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/click/decorators.py", line 84, in new_func
        return ctx.invoke(f, obj, *args, **kwargs)
      File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/usr/bin/ubuntu-drivers", line 432, in autoinstall
        command_install(config)
      File "/usr/bin/ubuntu-drivers", line 187, in command_install
        UbuntuDrivers.detect.nvidia_desktop_pre_installation_hook(to_install)
      File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 839, in nvidia_desktop_pre_installation_hook
        with_nvidia_kms = version >= 470
    UnboundLocalError: local variable 'version' referenced before assignment
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    查询可安装的驱动:

    # 更新包管理
    sudo apt update
    sudo apt upgrade
    
    # 更新 ubuntu-drivers 工具
    sudo apt install --reinstall ubuntu-drivers-common
    
    # 手动安装 NVIDIA 驱动
    ubuntu-drivers list
    sudo apt install 上面显示的驱动程序名称
    
    # ***-server 服务器版本
    # ****-open 开源版本
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    二、安装 cuDNN

    访问官网,获取下载地址(需要开发者账号)
    在这里插入图片描述
    下载并安装

    # 登录并下载文件后,上传到服务器, 再进行安装
    apt install ./cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb
    
    • 1
    • 2

    三、安装Mincanda

    mkdir -p ~/miniconda3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm -rf ~/miniconda3/miniconda.sh
    
    • 1
    • 2
    • 3
    • 4

    运行系统环境变量初始化命令:

    ~/miniconda3/bin/conda init bash
    ~/miniconda3/bin/conda init zsh
    source ~/.bashrc
    
    • 1
    • 2
    • 3
    3.1 更换清华源镜像

    各系统都可以通过修改用户目录下的 .condarc 文件来使用 TUNA 镜像源

    conda config --set show_channel_urls yes
    vim ~/.condarc
    
    • 1
    • 2

    编辑入如下内容:

    channels:
      - defaults
    show_channel_urls: true
    default_channels:
      - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
      - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
      - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
    custom_channels:
      conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
      deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    运行 conda clean -i 清除索引缓存,保证用的是镜像站提供的索引。

    3.2pip设置清华源

    创建或编辑pip配置文件:

    如果没有~/.pip目录,首先创建它:

    mkdir -p ~/.pip
    
    • 1

    接下来,创建或编辑~/.pip/pip.conf文件。您可以使用任何文本编辑器,比如nano或vim。

    vim ~/.pip/pip.conf
    
    • 1

    在编辑器中,将以下内容添加到pip.conf文件中,以将pip源设置为清华大学镜像:

    [global]
    index-url = https://pypi.tuna.tsinghua.edu.cn/simple
    trusted-host = pypi.tuna.tsinghua.edu.cn
    
    • 1
    • 2
    • 3

    验证更改:

    pip config get global.index-url
    
    • 1

    这将显示全局pip源的URL,应该显示清华源的URL。

    四、安装PyTorch

    选择系统版本获取下载命令
    在这里插入图片描述
    直接安装最新版即可

    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
    
    
    • 1
    • 2

    出现如下内容表示成功

    Preparing transaction: done                                              
    Verifying transaction: done                                              
    Executing transaction: done   
    
    • 1
    • 2
    • 3

    验证安装是否成功:

    执行 python

    # 验证torch安装
    import torch
    print(torch.__version__)
    
    #验证cuda安装
    print(torch.cuda.is_available())
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
  • 相关阅读:
    买条新内存给台式机扩容,没想到出现玄学花屏
    2000万的行数在2023年仍然是 MySQL 表的有效软限制吗?
    AWS S3
    MySQL 索引及查询优化总结
    一文搞定,JMeter的三种参数化方式
    文件上传 [GXYCTF2019]BabyUpload1
    Linux--信号量共享内存
    PMSM FOC位置环S曲线控制算法(恒定急动度)
    正则系列之断言Assertions
    Vue2:vue-admin-template项目修改访问端口和网站名(人力资源项目)
  • 原文地址:https://blog.csdn.net/voke_/article/details/132816290