【NVIDIA】获取GPU利用率-cpp.md

在深度学习推理中，为了更加高效的利用 GPU，在多个推理任务实例中，创建新的实例以及分配到不同的 GPU 设备上，需要关注到当前 GPU 还有多少剩余，以便更好的分配

代码目录

.
├── CMakeLists.txt
├── src
│   └── main.cpp
├── ubuntu_build.sh
└── win10_vs2019_build.bat
1
2
3
4
5
6

windows

前提条件

确保已经安装了 Nvidia 驱动和 CUDA 安装包

nvidia-smi.exe 
1

可以运行，截图如下：
在这里插入图片描述

nvcc --version
1

可以运行，截图如下：
在这里插入图片描述

cuda 安装目录

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

修改 CMakeLists.txt 文件中的 CUDA_ROOT 为自己安装的目录

NOTE: 注意斜杠 / 和反斜杠 \

构建项目

打开控制台，执行 win10_vs2019_build.bat , 或者直接双击 win10_vs2019_build.bat

NOTE: 前提是安装了 vs2019, “Visual Studio 16 2019”，其他 VS 版本可以同步替换

编译

进入 win10_build 目录，双击 get_gpu_info.sln，修改编译类型为 Release，编译后生成 get_gpu_info.exe，双击运行即可。

linux

在这里插入图片描述

构建，编译，执行

bash ubuntu_build.sh
1

在这里插入图片描述

代码附录

src/main.cpp

/***************************************************************************\
|*                                                                           *|
|*      Copyright 2010-2016 NVIDIA Corporation.  All rights reserved.        *|
|*                                                                           *|
|*   NOTICE TO USER:                                                         *|
|*                                                                           *|
|*   This source code is subject to NVIDIA ownership rights under U.S.       *|
|*   and international Copyright laws.  Users and possessors of this         *|
|*   source code are hereby granted a nonexclusive, royalty-free             *|
|*   license to use this code in individual and commercial software.         *|
|*                                                                           *|
|*   NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE     *|
|*   CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR         *|
|*   IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH      *|
|*   REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF         *|
|*   MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR          *|
|*   PURPOSE. IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL,            *|
|*   INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES          *|
|*   WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN      *|
|*   AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING     *|
|*   OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE      *|
|*   CODE.                                                                   *|
|*                                                                           *|
|*   U.S. Government End Users. This source code is a "commercial item"      *|
|*   as that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting       *|
|*   of "commercial computer  software" and "commercial computer software    *|
|*   documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995)   *|
|*   and is provided to the U.S. Government only as a commercial end item.   *|
|*   Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through        *|
|*   227.7202-4 (JUNE 1995), all U.S. Government End Users acquire the       *|
|*   source code with only those rights set forth herein.                    *|
|*                                                                           *|
|*   Any use of this source code in individual and commercial software must  *|
|*   include, in the user documentation and internal comments to the code,   *|
|*   the above Disclaimer and U.S. Government End Users Notice.              *|
|*                                                                           *|
|*                                                                           *|
\***************************************************************************/

#include 
#include 

static const char *convertToComputeModeString(nvmlComputeMode_t mode)
{
    switch (mode)
    {
    case NVML_COMPUTEMODE_DEFAULT:
        return "Default";
    case NVML_COMPUTEMODE_EXCLUSIVE_THREAD:
        return "Exclusive_Thread";
    case NVML_COMPUTEMODE_PROHIBITED:
        return "Prohibited";
    case NVML_COMPUTEMODE_EXCLUSIVE_PROCESS:
        return "Exclusive Process";
    default:
        return "Unknown";
    }
}


int main(int argc, char* argv[])
{
    nvmlReturn_t result;
    unsigned int device_count;

    // 初始化 NVML
    result = nvmlInit();
    if (result != NVML_SUCCESS)
    {
        printf("Failed to initialize NVML: %s\n", nvmlErrorString(result));
        printf("Press ENTER to continue...\n");
        getchar();
        return (int)(result);
    }

    // 获取设备数量
    result = nvmlDeviceGetCount(&device_count);
    if (result != NVML_SUCCESS)
    {
        printf("Failed to get device count: %s\n", nvmlErrorString(result));
        printf("Press ENTER to continue...\n");
        getchar();
        return (int)(result);
    }

    // 遍历设备数量
    printf("\nFound %u device%s, Listing devices:\n", device_count, 
        device_count != 1 ? "s" : "");
    printf("--------------------------------------------------------------\n");
    printf("| Device ID | Device Name\t\t|      pci.busId     | GPU Util | Mem Util |\n");
    for (int i = 0; i < device_count; ++i) 
    {
        nvmlDevice_t device;
        char device_name[NVML_DEVICE_NAME_BUFFER_SIZE];
        nvmlPciInfo_t pci;
        nvmlComputeMode_t compute_mode;

        // 获取设备, 也可以使用其他方式来获取设备
        // nvmlDeviceGetHandleBySerial
        // nvmlDeviceGetHandleByPciBusId
        result = nvmlDeviceGetHandleByIndex(i, &device);
        if (result != NVML_SUCCESS)
        {
            printf("Failed to get handle for device %d: %s\n", i, 
                nvmlErrorString(result));
            continue;
        }        

        // 获取 GPU 设备名称    
        result = nvmlDeviceGetName(device, device_name, NVML_DEVICE_NAME_BUFFER_SIZE);
        if (result != NVML_SUCCESS)
        {
            printf("Failed to get name for device %d: %s\n", i, 
                nvmlErrorString(result));
            continue;
        }

        // pci.busId is very useful to know which device physically you're talking to
        // Using PCI identifier you can also match nvmlDevice handle to CUDA device.
        result = nvmlDeviceGetPciInfo(device, &pci);
        if (result != NVML_SUCCESS)
        {
            printf("Failed to get pci info for device %u: %s\n", i, 
                nvmlErrorString(result));
            continue;
        }

        // 获取 GPU 设备的利用率
        nvmlUtilization_st device_utilization;
        result = nvmlDeviceGetUtilizationRates(device, &device_utilization);
        if (result != NVML_SUCCESS)
        {
            printf("Failed to get utilization for device %d: %s\n", i, 
                nvmlErrorString(result));
            continue;
        }

        printf("|     %d     | %s\t| [%s] |    %u %%  |    %u %%   | \n",
               i, device_name, pci.busId, device_utilization.gpu, device_utilization.memory);
        printf("--------------------------------------------------------------\n");

        // 改变 GPU 状态的简单示例
        result = nvmlDeviceGetComputeMode(device, &compute_mode);
        if (NVML_ERROR_NOT_SUPPORTED == result)
        {
            printf("\t This is not CUDA capable device\n");
        }       
        else if (NVML_SUCCESS != result)
        {
            printf("Failed to get compute mode for device %u: %s\n", i, nvmlErrorString(result));
            continue;
        }
        else
        {
            // try to change compute mode
            printf("\t Changing device's compute mode from '%s' to '%s'\n",
                convertToComputeModeString(compute_mode),
                convertToComputeModeString(NVML_COMPUTEMODE_PROHIBITED));

            result = nvmlDeviceSetComputeMode(device, NVML_COMPUTEMODE_PROHIBITED);
            if (NVML_ERROR_NO_PERMISSION == result)
            {
                printf("\t\t Need root privileges to do that: %s\n", nvmlErrorString(result));
            }        
            else if (NVML_ERROR_NOT_SUPPORTED == result)
            {
                printf("\t\t Compute mode prohibited not supported. You might be running on\n"
                    "\t\t windows in WDDM driver model or on non-CUDA capable GPU\n");
            }
            else if (NVML_SUCCESS != result)
            {
                printf("\t\t Failed to set compute mode for device %u: %s\n", i, nvmlErrorString(result));
                continue;
            }
            else
            {
                printf("\t Restoring device's compute mode back to '%s'\n",
                    convertToComputeModeString(compute_mode));
                result = nvmlDeviceSetComputeMode(device, compute_mode);
                if (NVML_SUCCESS != result)
                {
                    printf("\t\t Failed to restore compute mode for device %u: %s\n", i, nvmlErrorString(result));
                    continue;
                }
            }
        }
    }

    // 关闭 NVML 
    nvmlShutdown();
    if (NVML_SUCCESS != result)
    {
        printf("Failed to shutdown NVML: %s\n", nvmlErrorString(result));
        printf("Press ENTER to continue...\n");
        getchar();
        return (int)(result);
    }

    printf("All done.\n");
    printf("Press ENTER to continue...\n");
    getchar();
    return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203

CMakeLists.txt

cmake_minimum_required(VERSION 3.0.0)
set(PROJECT_NAME get_gpu_info)
project(${PROJECT_NAME})

SET(CMAKE_CONFIGURATION_TYPES ${CMAKE_BUILD_TYPE} CACHE STRING "Release" FORCE)

add_executable(${PROJECT_NAME} src/main.cpp)

if (WIN32)
    set(CUDA_ROOT "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8")
    include_directories("${CUDA_ROOT}/include/")
    target_link_libraries(${PROJECT_NAME} "${CUDA_ROOT}/lib/x64/nvml.lib")
endif()

if (UNIX)
    set(CUDA_ROOT "/usr/local/cuda")
    include_directories("${CUDA_ROOT}/include/")
    link_directories("${CUDA_ROOT}/lib64/stubs")
    target_link_libraries(${PROJECT_NAME}  libnvidia-ml.so)
endif()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

win10_vs2019_build.bat

::在主CMakeLists.txt 里设置opencv和ncnn的路径
set build_dir=win10_build

::删除编译目录
rm -rf %build_dir%

::重新创建编译目录
mkdir %build_dir%

::进入编译目录
cd %build_dir%

::配置, 此处可以利用 -D 添加编译选项
cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE=Release ..

::退出目录
cd ..
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

ubuntu_build.sh

#!/bin/bash

build_dir=ubuntu2204_build

# 删除编译目录
rm -rf ${build_dir}

# 重新创建目录
mkdir ${build_dir}

# 进入目录
cd ${build_dir}

# 构建项目
cmake -DCMAKE_BUILD_TYPE=RELEASE .. 

# 编译
make -j8

# 拷贝出来
cp get_gpu_info ../
cd ..

# 执行
./get_gpu_info

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

相关阅读:
Kafka消费者分区分配策略
 公司大佬推荐测试进阶书单
 学习笔记—Grafana监控docker--mysql、redis的实战
 算法小讲堂之平衡二叉树|AVL树（超详细~）
企业数字化神经网络
 响应式编程（Reactive Programming）是什么？
热门敏捷开发管理工具
 使用elementUI的form表单和Steps步骤条如何让rules分步骤校验
 程序编码风格要求
 java计算机毕业设计实验中心网站（附源码、数据库）
原文地址：https://blog.csdn.net/zhoujinwang/article/details/133907425