• 在Ubuntu上安装CUDA和cuDNN以及验证安装步骤


    在Ubuntu上安装CUDA和cuDNN以及验证安装步骤

    本教程详细介绍了如何在Ubuntu操作系统上安装CUDA(NVIDIA的并行计算平台)和cuDNN(深度神经网络库),以及如何验证安装是否成功。通过按照这些步骤操作,您将能够配置您的系统以利用GPU加速深度学习和其他计算密集型任务。此外,还包括如何设置环境变量和编译运行示例代码以验证CUDA和cuDNN的正常运行。

    安装 CUDA

    在安装CUDA之前,我们需要进行一些预安装操作。首先,您需要安装当前正在运行的内核的头文件和开发包。打开终端并执行以下命令:

    sudo apt-get install linux-headers-$(uname -r)
    
    • 1

    接下来,您需要删除过时的签名密钥:

    sudo apt-key del 7fa2af80
    
    • 1

    通过网络仓库安装CUDA(适用于Ubuntu)

    新的CUDA存储库的GPG公钥是3bf863cc。您可以通过cuda-keyring包或手动方法将其添加到系统中,不建议使用apt-key命令。执行以下步骤:

    1. 安装新的cuda-keyring包。根据您的系统版本替换$distro/$arch
    wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    
    • 1
    • 2

    $distro/$arch 应该根据以下选项之一进行替换:

    • ubuntu1604/x86_64:适用于 Ubuntu 16.04 64位版本。
    • ubuntu1804/cross-linux-sbsa:适用于 Ubuntu 18.04 交叉编译版本(SBSA 架构)。
    • ubuntu1804/ppc64el:适用于 Ubuntu 18.04 64位 PowerPC 架构版本。
      * ubuntu1804/sbsa:适用于 Ubuntu 18.04 SBSA 架构版本。
    • ubuntu1804/x86_64:适用于 Ubuntu 18.04 64位版本。
    • ubuntu2004/cross-linux-aarch64:适用于 Ubuntu 20.04 交叉编译版本(AArch64 架构)。
    • ubuntu2004/arm64:适用于 Ubuntu 20.04 64位 ARM 架构版本。
    • ubuntu2004/cross-linux-sbsa:适用于 Ubuntu 20.04 交叉编译版本(SBSA 架构)。
    • ubuntu2004/sbsa:适用于 Ubuntu 20.04 SBSA 架构版本。
    • ubuntu2004/x86_64:适用于 Ubuntu 20.04 64位版本。
    • ubuntu2204/sbsa:适用于 Ubuntu 22.04 SBSA 架构版本。
    • ubuntu2204/x86_64:适用于 Ubuntu 22.04 64位版本。
      根据您的Ubuntu版本和架构选择适当的替代项来执行相应的安装步骤。
    1. 更新Apt仓库缓存:
    sudo apt-get update
    
    • 1
    1. 安装 CUDA SDK:
      您可以使用以下命令获取可用的CUDA包列表:
    cat /var/lib/apt/lists/*cuda*Packages | grep "Package:"
    
    • 1

    或查看下方列表:

    Meta PackagePurpose
    cudaInstalls all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when it’s released.
    cuda-12-2Installs all CUDA Toolkit and Driver packages. Remains at version 12.1 until an additional version of CUDA is installed.
    cuda-toolkit-12-2Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver.
    cuda-toolkit-12Installs all CUDA Toolkit packages required to develop applications. Will not upgrade beyond the 12.x series toolkits. Does not include the driver.
    cuda-toolkitInstalls all CUDA Toolkit packages required to develop applications. Handles upgrading to the next 12.x version of CUDA when it’s released. Does not include the driver.
    cuda-tools-12-2Installs all CUDA command line and visual tools.
    cuda-runtime-12-2Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages.
    cuda-compiler-12-2Installs all CUDA compiler packages.
    cuda-libraries-12-2Installs all runtime CUDA Library packages.
    cuda-libraries-dev-12-2Installs all development CUDA Library packages.
    cuda-driversInstalls all Driver packages. Handles upgrading to the next version of the Driver packages when they’re released.

    选择你需要的包进行安装,这里选择 cuda-11.8

    sudo apt-get install cuda-11-8
    
    • 1

    此安装包中包含显卡驱动,安装过程中,会让你输入密码,请记住该密码,后面重启电脑进入 Perform MOK managment 会使用到。

    1. 安装完成后,重新启动系统:
    sudo reboot
    
    • 1

    配置 Perform MOK managment
    MOK management
    选择 Enroll MOK (注册)-> 选择 Continue -> 选择 Enroll the key -> 选择 Yes -> 键入步骤3中输入的密码->选择 Reboot 重启电脑,完成英伟达显卡驱动安装。

    配置环境变量

    1. 使用 vim 编辑 ~/.bashrc 文件。
    sudo vim ~/.bashrc
    
    • 1
    1. 在文件结尾添加以下内容:
    export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
    • 1
    • 2

    ${PATH:+:${PATH}} 是一个用于设置环境变量的 Bash Shell 中的特殊语法。它的作用是在添加新路径到环境变量时,确保如果原始变量(在这种情况下是 $PATH)已经包含一些路径,那么新路径会添加在原有路径的末尾,而且它们之间会用冒号 : 分隔。
    具体来说,${PATH:+:${PATH}} 的含义是:
    如果 $PATH 已经定义(非空),那么它会在新路径之前加上一个冒号 :,然后再添加新路径。
    如果 $PATH 未定义或为空,那么它只会添加新路径,不会加冒号。
    这个语法的目的是确保在向 $PATH 添加新路径时,保持路径之间用冒号分隔,以确保环境变量的正确格式。这在很多环境变量的设置中都很有用,因为它避免了路径之间缺少分隔符而导致的错误。

    LD_LIBRARY_PATH 是一个环境变量,用于指定动态链接器(dynamic linker)在运行可执行文件时搜索共享库文件(动态链接库或共享对象文件)的路径。在 Linux 和类Unix系统中,共享库文件包含在各种程序中,允许多个程序共享相同的库,从而减少内存占用并提高系统的效率。

    1. 刷新配置
      在终端中运行以下命令,以使新的环境变量设置生效:
    source ~/.bashrc
    
    • 1

    验证安装

    首先,我们需要安装一些CUDA示例所需的第三方库。这些示例通常会在构建过程中检测所需的库,但如果未检测到,您需要手动安装它们。打开终端并执行以下命令:

    sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
        libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev
    
    • 1
    • 2

    完成第三方库依赖安装后,从 github 下载 https://github.com/nvidia/cuda-samples 源代码。

    下载完成后,可以使用以下命令编译:

    cd cuda-sample
    sudo make
    
    • 1
    • 2

    注意切换到你安装 cuda 版本的分支,这里是 v11.8。

    可以完成整个编译,那么说明安装过程没有问题了。

    在源代码目录执行 ./bin/x86_64/linux/release/deviceQuery 命令,结果如下所示:

    cheungxiongwei@root:~/Source/cuda-samples$ ./bin/x86_64/linux/release/deviceQuery
    ./bin/x86_64/linux/release/deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
      CUDA Driver Version / Runtime Version          12.2 / 11.8
      CUDA Capability Major/Minor version number:    8.9
      Total amount of global memory:                 7940 MBytes (8325824512 bytes)
    MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
    MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
      (024) Multiprocessors, (128) CUDA Cores/MP:    3072 CUDA Cores
      GPU Max Clock rate:                            2250 MHz (2.25 GHz)
      Memory Clock rate:                             8001 Mhz
      Memory Bus Width:                              128-bit
      L2 Cache Size:                                 33554432 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total shared memory per multiprocessor:        102400 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  1536
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device supports Managed Memory:                Yes
      Device supports Compute Preemption:            Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1
    Result = PASS
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49

    安装 cuDNN

    安装 cuDNN库和 cuDNN 示例

    sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
    sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
    sudo apt-get install libcudnn8-samples=${cudnn_version}-1+${cuda_version}
    
    • 1
    • 2
    • 3

    根据以下内容进行替换:
    ${cudnn_version} is 8.9.4.*
    ${cuda_version} is cuda12.2 or cuda11.8

    使用以下命令查找与 cuDNN 版本 “libcudnn8” 相关的软件包信息

    cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
    
    • 1

    输出结果如下所示:

    cheungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
    Filename: ./libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb
    Filename: ./libcudnn8-dev_8.5.0.96-1+cuda11.7_amd64.deb
    Filename: ./libcudnn8_8.6.0.163-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.6.0.163-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.7.0.84-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.7.0.84-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.8.0.121-1+cuda12.0_amd64.deb
    Filename: ./libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.8.0.121-1+cuda12.0_amd64.deb
    Filename: ./libcudnn8_8.8.1.3-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
    Filename: ./libcudnn8-dev_8.8.1.3-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
    Filename: ./libcudnn8_8.9.0.131-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.9.0.131-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-dev_8.9.0.131-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.9.0.131-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8_8.9.1.23-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.9.1.23-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-dev_8.9.1.23-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.9.1.23-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-samples_8.9.1.23-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-samples_8.9.1.23-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.9.2.26-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.9.2.26-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-samples_8.9.2.26-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-samples_8.9.2.26-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8_8.9.3.28-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.9.3.28-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-dev_8.9.3.28-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.9.3.28-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8-samples_8.9.3.28-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-samples_8.9.3.28-1+cuda12.1_amd64.deb
    Filename: ./libcudnn8_8.9.4.25-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8_8.9.4.25-1+cuda12.2_amd64.deb
    Filename: ./libcudnn8-dev_8.9.4.25-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-dev_8.9.4.25-1+cuda12.2_amd64.deb
    Filename: ./libcudnn8-samples_8.9.4.25-1+cuda11.8_amd64.deb
    Filename: ./libcudnn8-samples_8.9.4.25-1+cuda12.2_amd64.deb
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    这里选择最新的 cudnn 8.9.4.25,和 cuda 11.8 进行替换,替换后的完整指令如下所示:

    sudo apt-get install libcudnn8=8.9.4.25-1+cuda11.8
    sudo apt-get install libcudnn8-dev=8.9.4.25-1+cuda11.8
    sudo apt-get install libcudnn8-samples=8.9.4.25-1+cuda11.8
    
    • 1
    • 2
    • 3

    验证 cuDNN

    要验证 cuDNN 是否已安装并正常运行,请编译 `/usr/src/cudnn_samples_v8`` 目录中的 mnistCUDNN 示例。

    1. 复制 cuDNN 示例到当前用户目录
    cp -r /usr/src/cudnn_samples_v8/ $HOME
    
    • 1
    1. 移动到 cuDNN 示例目录中
    cd  $HOME/cudnn_samples_v8/mnistCUDNN
    
    • 1
    1. 编译 cuDNN mnisiCUDNN 示例
    $make clean && make
    
    • 1

    如报错没有找到 FreeImage.h 文件,请执行 `sudo apt-get install libfreeimage-dev`` 指令安装该依赖。

    1. 运行 mnistCUDNN 示例
     ./mnistCUDNN
    
    • 1

    如果 cuDNN 在您的 Linux 系统上正确安装并编译&运行,您将看到类似以下内容的消息:

    heungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
    Executing: mnistCUDNN
    cudnnGetVersion() : 8904 , CUDNN_VERSION from cudnn.h : 8904 (8.9.4)
    Host compiler version : GCC 11.4.0
    
    There are 1 CUDA capable devices on your machine :
    device 0 : sms 24  Capabilities 8.9, SmClock 2250.0 Mhz, MemSize (Mb) 7940, MemClock 8001.0 Mhz, Ecc=0, boardGroupID=0
    Using device 0
    
    Testing single precision
    Loading binary file data/conv1.bin
    Loading binary file data/conv1.bias.bin
    Loading binary file data/conv2.bin
    Loading binary file data/conv2.bias.bin
    Loading binary file data/ip1.bin
    Loading binary file data/ip1.bias.bin
    Loading binary file data/ip2.bin
    Loading binary file data/ip2.bias.bin
    Loading image data/one_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.010240 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.018432 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.032992 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.047104 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.051200 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049152 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.058368 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063648 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.130112 time requiring 128848 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Resulting weights from Softmax:
    0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
    Loading image data/three_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.007328 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011264 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.024576 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.026624 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025376 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.030720 time requiring 128848 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.036864 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Resulting weights from Softmax:
    0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
    Loading image data/five_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 
    
    Result of classification: 1 3 5
    
    Test passed!
    
    Testing half precision (math in single precision)
    Loading binary file data/conv1.bin
    Loading binary file data/conv1.bias.bin
    Loading binary file data/conv2.bin
    Loading binary file data/conv2.bias.bin
    Loading binary file data/ip1.bin
    Loading binary file data/ip1.bias.bin
    Loading binary file data/ip2.bin
    Loading binary file data/ip2.bias.bin
    Loading image data/one_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.022592 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.025600 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.033792 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.074752 time requiring 4608 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.031744 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051168 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 1536 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Resulting weights from Softmax:
    0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
    Loading image data/three_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009216 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 28800 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.021312 time requiring 184784 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 4608 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.024352 time requiring 178432 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029696 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 2450080 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 4656640 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 1536 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
    ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
    Resulting weights from Softmax:
    0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
    Loading image data/five_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 
    
    Result of classification: 1 3 5
    
    Test passed!
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
  • 相关阅读:
    如何发现问题
    配置NFS服务器
    SpringBoot Admin 详解
    电影《乌云背后的幸福线》观后感
    MyISAM和innoDB两种引擎的对比
    计算机毕业论文选题推荐|软件工程|系列十一
    手机短信接收验证码的实现原理
    LeetCode每日一题(2216. Minimum Deletions to Make Array Beautiful)
    R语言基于ARMA-GARCH过程的VaR拟合和预测
    用户头像加载失败时,显示用户名首字符
  • 原文地址:https://blog.csdn.net/cheungxiongwei/article/details/132655076