• pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found


    在Docker中运行报错:

    Traceback (most recent call last):
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
        _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
      File "/opt/conda/envs/rapids/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
        func = self.__getitem__(name)
      File "/opt/conda/envs/rapids/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
        func = self._FuncPtr((name_or_ordinal, self))
    AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_cuda/initialize.py", line 32, in _create_cuda_context
        distributed.comm.ucx.init_once()
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/comm/ucx.py", line 86, in init_once
        pre_existing_cuda_context = has_cuda_context()
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 91, in has_cuda_context
        running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
        fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
      File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
        raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
    pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
    2022-05-16 15:19:14,517 - distributed.preloading - INFO - Run preload setup click command: dask_cuda.initialize
    2022-05-16 15:19:14,517 - distributed.worker - INFO -       Start worker at:    ws://10.233.68.22:39537/
    2022-05-16 15:19:14,517 - distributed.worker - INFO -          Listening to:    ws://10.233.68.22:39537/
    2022-05-16 15:19:14,517 - distributed.worker - INFO -          dashboard at:         10.233.68.22:35313
    2022-05-16 15:19:14,517 - distributed.worker - INFO - Waiting to connect to: ws://launcher-svc-1245231:8786/
    2022-05-16 15:19:14,517 - distributed.worker - INFO - -------------------------------------------------
    2022-05-16 15:19:14,517 - distributed.worker - INFO -               Threads:                          1
    2022-05-16 15:19:14,517 - distributed.worker - INFO -                Memory:                 400.00 GiB
    2022-05-16 15:19:14,517 - distributed.worker - INFO -       Local Directory: /rapids/notebooks/dask-worker-space/worker-ave_m7tw
    2022-05-16 15:19:14,517 - distributed.worker - INFO - Starting Worker plugin PreImport-0b003d61-7c5f-4530-bf6f-c95b93c83338
    2022-05-16 15:19:14,517 - distributed.worker - INFO - Starting Worker plugin CPUAffinity-a1d437c7-bb5d-408e-a3e0-3120dd6c6a5f
    2022-05-16 15:19:14,518 - distributed.worker - INFO - Starting Worker plugin RMMSetup-03e12d8b-4b23-4e0e-9b3c-a79b6b12e7ab
    2022-05-16 15:19:14,974 - distributed.worker - INFO - -------------------------------------------------
    2022-05-16 15:19:15,025 - distributed.worker - INFO -         Registered to: ws://launcher-svc-1245231:8786/
    2022-05-16 15:19:15,025 - distributed.worker - INFO - -------------------------------------------------
    2022-05-16 15:19:15,026 - distributed.core - INFO - Starting established connection
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39

    nvidia-smi查看当前Cuda版本:

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
    | 35%   33C    P8    18W / 220W |    552MiB /  7959MiB |     13%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    参照提示:解决方案
    因为Cuda和pynvml库间存在对应关系,要么升级Cuda,要么降级pynvml。
    进入python3,查看pynvml版本:

    >>> import pynvml
    >>> print(pynvml.__version__)
    11.5.1
    
    
    • 1
    • 2
    • 3
    • 4

    猜想可能是pynvml版本过高与Cuda不匹配导致的,直接通过pip降级pynvml。

    pip install pynvml==11.4.1
    
    • 1

    问题解决。

  • 相关阅读:
    PySpark数据分析基础:pyspark.mllib.regression机器学习回归核心类详解(一)+代码详解
    服务器数据恢复—Storwize V3700存储数据恢复案例
    9.7-一定要开始学了
    使用appium启动app运行常见问题
    归并排序的递归和非递归实现
    如何高效获取电商数据
    【密码学】RSA的攻与防_4.0
    Python网络编程(OSI Socket)
    蓝桥杯官网练习题(回文日期)
    详解设计模式:享元模式
  • 原文地址:https://blog.csdn.net/qq_43650421/article/details/134042639