• nvidia-smi常用选项汇总


    前言

    nvidia-smi 命令(又称NVSMI)的全称是 NVIDIA System Management Interface,用于监控和管理GPU设备。

    直接在终端执行 nvidia-smi 可查看所有的GPU设备及其相关信息:

    root@container-14dc11ad52-9e0fd82d:~# nvidia-smi
    Sun Sep 18 10:21:55 2022
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-DGXS...  Off  | 00000000:07:00.0 Off |                    0 |
    | N/A   48C    P0   175W / 300W |   5955MiB / 32508MiB |      6%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla V100-DGXS...  Off  | 00000000:08:00.0 Off |                    0 |
    | N/A   58C    P0   257W / 300W |  27128MiB / 32508MiB |     93%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   2  Tesla V100-DGXS...  Off  | 00000000:0E:00.0 Off |                    0 |
    | N/A   48C    P0    52W / 300W |   2768MiB / 32508MiB |     32%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   3  Tesla V100-DGXS...  Off  | 00000000:0F:00.0 Off |                    0 |
    | N/A   46C    P0    40W / 300W |     13MiB / 32508MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      2151      G   /usr/lib/xorg/Xorg                 58MiB |
    |    0   N/A  N/A      2255      G   /usr/bin/gnome-shell               83MiB |
    |    0   N/A  N/A      7145      C   python                           2839MiB |
    |    0   N/A  N/A      7364      C   python                           2755MiB |
    |    0   N/A  N/A     20935      G   /usr/lib/xorg/Xorg                 24MiB |
    |    0   N/A  N/A     21079      G   /usr/bin/gnome-shell              189MiB |
    |    1   N/A  N/A      2151      G   /usr/lib/xorg/Xorg                  4MiB |
    |    1   N/A  N/A     20935      G   /usr/lib/xorg/Xorg                  4MiB |
    |    1   N/A  N/A     34676      C   python                          27115MiB |
    |    2   N/A  N/A      2151      G   /usr/lib/xorg/Xorg                  4MiB |
    |    2   N/A  N/A     20565      C   python                           2755MiB |
    |    2   N/A  N/A     20935      G   /usr/lib/xorg/Xorg                  4MiB |
    |    3   N/A  N/A      2151      G   /usr/lib/xorg/Xorg                  4MiB |
    |    3   N/A  N/A     20935      G   /usr/lib/xorg/Xorg                  4MiB |
    +-----------------------------------------------------------------------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46

    关于该面板的解读可参考这篇文章

    GENERAL OPTIONS

    输入 nvidia-smi -h 可查看该命令的帮助手册。

    SUMMARY OPTIONS

    输入 nvidia-smi -L 可以列出所有的GPU设备及其UUID

    root@container-14dc11ad52-9e0fd82d:~# nvidia-smi -L
    GPU 0: Tesla V100-DGXS-32GB (UUID: GPU-8e82d306-7c7b-b020-2847-afe95fd09f33)
    GPU 1: Tesla V100-DGXS-32GB (UUID: GPU-8c4978ad-c5d1-e4d0-19ac-c659644fdb02)
    GPU 2: Tesla V100-DGXS-32GB (UUID: GPU-8aec1981-46ca-fd72-376d-51d9eeaf166b)
    GPU 3: Tesla V100-DGXS-32GB (UUID: GPU-b0a24c4f-6928-3ac2-7fba-a2969bbad8ba)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    QUERY OPTIONS

    输入 nvidia-smi -q 可以列出所有GPU设备的详细信息。如果只想列出某一GPU的详细信息,可使用 -i 选项指定。


    输入 nvidia-smi -i [GPU编号] 可以只列出某一GPU设备的信息。因为该主机只有4块GPU,所以 [GPU编号] 的取值范围为 {0, 1, 2, 3}

    root@container-14dc11ad52-9e0fd82d:~# nvidia-smi -i 1
    Sun Sep 18 10:18:52 2022
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   1  Tesla V100-DGXS...  Off  | 00000000:08:00.0 Off |                    0 |
    | N/A   57C    P0   229W / 300W |  27128MiB / 32508MiB |     99%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    1   N/A  N/A      2151      G   /usr/lib/xorg/Xorg                  4MiB |
    |    1   N/A  N/A     20935      G   /usr/lib/xorg/Xorg                  4MiB |
    |    1   N/A  N/A     34676      C   python                          27115MiB |
    +-----------------------------------------------------------------------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    -i 选项也可配合其他选项使用,例如

    root@container-14dc11ad52-9e0fd82d:~# nvidia-smi -q -i 0
    
    • 1

    的作用就是列出第0块GPU的详细信息。


    输入 nvidia-smi -l [second] 后会每隔 second 秒刷新一次面板。监控GPU利用率通常会选择每隔1秒刷新一次,即

    root@container-14dc11ad52-9e0fd82d:~# nvidia-smi -l 1
    
    • 1

    📄 更多内容可参考官方文档

  • 相关阅读:
    时序预测 | MATLAB实现TCN时间卷积神经网络的时间序列预测
    Visual Studio2022安装教程【图文详解】(大一小白)编译软件
    计算机二级备考:Word 部分_1文件操作
    Plonky2:最好的SNARKs和STARKs
    华为S5700交换机初始化和配置telnet,ssh用户方法
    vue3项目运行报错import zhCn from “element-plus/lib/locale/lang/zh-cn“
    Redis哨兵机制配置实战-实测(Redis6.2.5版本)
    【kafka】kafka介绍
    SpringBoot_11_整合MyBatis
    技术问题分析和解决汇总,持续维护
  • 原文地址:https://blog.csdn.net/raelum/article/details/126914188