pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

在Docker中运行报错：

Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/opt/conda/envs/rapids/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/opt/conda/envs/rapids/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_cuda/initialize.py", line 32, in _create_cuda_context
    distributed.comm.ucx.init_once()
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/comm/ucx.py", line 86, in init_once
    pre_existing_cuda_context = has_cuda_context()
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 91, in has_cuda_context
    running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
2022-05-16 15:19:14,517 - distributed.preloading - INFO - Run preload setup click command: dask_cuda.initialize
2022-05-16 15:19:14,517 - distributed.worker - INFO -       Start worker at:    ws://10.233.68.22:39537/
2022-05-16 15:19:14,517 - distributed.worker - INFO -          Listening to:    ws://10.233.68.22:39537/
2022-05-16 15:19:14,517 - distributed.worker - INFO -          dashboard at:         10.233.68.22:35313
2022-05-16 15:19:14,517 - distributed.worker - INFO - Waiting to connect to: ws://launcher-svc-1245231:8786/
2022-05-16 15:19:14,517 - distributed.worker - INFO - -------------------------------------------------
2022-05-16 15:19:14,517 - distributed.worker - INFO -               Threads:                          1
2022-05-16 15:19:14,517 - distributed.worker - INFO -                Memory:                 400.00 GiB
2022-05-16 15:19:14,517 - distributed.worker - INFO -       Local Directory: /rapids/notebooks/dask-worker-space/worker-ave_m7tw
2022-05-16 15:19:14,517 - distributed.worker - INFO - Starting Worker plugin PreImport-0b003d61-7c5f-4530-bf6f-c95b93c83338
2022-05-16 15:19:14,517 - distributed.worker - INFO - Starting Worker plugin CPUAffinity-a1d437c7-bb5d-408e-a3e0-3120dd6c6a5f
2022-05-16 15:19:14,518 - distributed.worker - INFO - Starting Worker plugin RMMSetup-03e12d8b-4b23-4e0e-9b3c-a79b6b12e7ab
2022-05-16 15:19:14,974 - distributed.worker - INFO - -------------------------------------------------
2022-05-16 15:19:15,025 - distributed.worker - INFO -         Registered to: ws://launcher-svc-1245231:8786/
2022-05-16 15:19:15,025 - distributed.worker - INFO - -------------------------------------------------
2022-05-16 15:19:15,026 - distributed.core - INFO - Starting established connection
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

用nvidia-smi查看当前Cuda版本：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 35%   33C    P8    18W / 220W |    552MiB /  7959MiB |     13%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
1
2
3
4
5
6
7
8
9
10
11

参照提示：解决方案
因为Cuda和pynvml库间存在对应关系，要么升级Cuda，要么降级pynvml。
进入python3，查看pynvml版本：

>>> import pynvml
>>> print(pynvml.__version__)
11.5.1

1
2
3
4

猜想可能是pynvml版本过高与Cuda不匹配导致的，直接通过pip降级pynvml。

pip install pynvml==11.4.1
1

问题解决。

相关阅读:
爽爆了！字节架构师纯手打Java技术小册（故事版）开源分享
 TYPE-C接口桌面显示器：视频与充电的双重革新
 UWB学习——day1
Python——对numpy类型的数据取整
 scrapy爬取数据
 数字货币回测准备：下载与清洗全量历史数据
 Python预测2022世界杯1/8决赛胜负
 【计算机视觉】图片文件格式的讲解
 数据库事务的超级详细讲解，包括事务特性、事务隔离级别、MVCC（多版本并发控制）
【嵌入式软件-STM32】按键控制LED & 光敏传感器控制蜂鸣器
原文地址：https://blog.csdn.net/qq_43650421/article/details/134042639