• 无CUDA支持的dlib库的安装与使用


    前言

    Dlib 是一个 C++ 工具包,被广泛应用于工业和学术界。Dlib 的开源许可允许在任何应用程序中免费使用它。Dlib支持导出其他编程语言如Python的binding。

    Python环境下一般安装dlib很方便,直接使用pip安装即可。但是某些时候由于CUDA支持的问题导致部分模型使用dlib库进行推理会出现异常。

    在这里插入图片描述

    本文简要介绍无CUDA支持的dlib库的安装与使用。

    安装

    以Ubuntu为例进行说明:

    pip安装

    pip install dlib
    
    • 1

    注:使用此方式安装的dlib默认支持CUDA加速(如已安装CUDA和CUDNN)。

    >>> import dlib
    >>> dlib.DLIB_USE_CUDA
    True
    
    • 1
    • 2
    • 3

    但是在使用face_recognition库进行推理的时候,出现了多卡机器上的显存分配异常(如下图中的253MiB的异常显存占用)和illegal memory access was encountered等问题。

    |    3   N/A  N/A     46077      C   ...envs/test/bin/python     4409MiB |
    |    3   N/A  N/A     46115      C   ...envs/test/bin/python     8569MiB |
    |    3   N/A  N/A     46165      C   ...envs/test/bin/python     5822MiB |
    |    3   N/A  N/A     46176      C   ...envs/test/bin/python      253MiB |
    |    3   N/A  N/A     46186      C   ...envs/test/bin/python      253MiB |
    |    3   N/A  N/A     46192      C   ...envs/test/bin/python      253MiB |
    |    4   N/A  N/A     46089      C   ...envs/test/bin/python     4409MiB |
    |    4   N/A  N/A     46118      C   ...envs/test/bin/python     8569MiB |
    |    4   N/A  N/A     46176      C   ...envs/test/bin/python     5569MiB |
    |    6   N/A  N/A     46100      C   ...envs/test/bin/python     4409MiB |
    |    6   N/A  N/A     46125      C   ...envs/test/bin/python     8569MiB |
    |    6   N/A  N/A     46186      C   ...envs/test/bin/python     5569MiB |
    |    7   N/A  N/A     46111      C   ...envs/test/bin/python     4409MiB |
    |    7   N/A  N/A     46150      C   ...envs/test/bin/python     8569MiB |
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    while calling cudnnFindConvolutionForwardAlgorithm( context(), 
    descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, 
    (const cudnnConvolutionDescriptor_t)conv_handle, 
    descriptor(dest_desc), num_possible_algorithms, &num_algorithms, 
    perf_results.data()) in file /tmp/pip-install-fz7s/dlib_537e10d/dlib/cuda/cudnn_dlibapi.cpp:819. 
    code: 2, reason: CUDA Resources could not be allocated.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    6846:CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    6847:For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    6889:2022-11-04 18:08:52,712 CUDA error: an illegal memory access was encountered
    6890:CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    6891:For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    
    • 1
    • 2
    • 3
    • 4
    • 5

    进一步分析发现,是由于机器的GPU和显卡驱动以及CUDNN太新,而face_recognition中的部分模型使用dlib导入后,会引发上述异常。

    同样在另一台旧机器上则不会出现上述问题,分析发现是同样版本的dlib在那台机器上没有支持CUDA,于是也就不会出现上述显存异常问题。

    >>> import dlib
    >>> dlib.DLIB_USE_CUDA
    False
    
    • 1
    • 2
    • 3

    于是可以考虑编译安装不支持CUDA的dlib以正常使用face_recognition。

    源码编译安装

    在这里插入图片描述

    git clone https://github.com/davisking/dlib.git
    cd dlib
    python setup.py install --no DLIB_USE_CUDA
    
    • 1
    • 2
    • 3

    测试:

    >>> import dlib
    >>> dlib.DLIB_USE_CUDA
    False
    
    • 1
    • 2
    • 3

    使用

    import numpy as np
    import face_recognition
    
    
    img = np.zeros((100,100,3)).astype(np.uint8)
    
    face_encodings = face_recognition.face_encodings(img, known_face_locations=[[10, 50, 50, 10]], model="small")
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    运行正常。

    face_encodings = face_recognition.face_encodings(img[:, : , ::-1], known_face_locations=[[10, 50, 50, 10]], model="small")
    
    • 1

    运行会报错:

    Traceback (most recent call last):
      File "", line 1, in <module>
      File "/data1/miniconda3/envs/torch1.12/lib/python3.7/site-packages/face_recognition/api.py", line 214, in face_encodings
        return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
      File "/data1/miniconda3/envs/torch1.12/lib/python3.7/site-packages/face_recognition/api.py", line 214, in <listcomp>
        return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
    TypeError: compute_face_descriptor(): incompatible function arguments. The following argument types are supported:
        1. (self: _dlib_pybind11.face_recognition_model_v1, img: numpy.ndarray[(rows,cols,3),numpy.uint8], face: _dlib_pybind11.full_object_detection, num_jitters: int = 0, padding: float = 0.25) -> _dlib_pybind11.vector
        2. (self: _dlib_pybind11.face_recognition_model_v1, img: numpy.ndarray[(rows,cols,3),numpy.uint8], num_jitters: int = 0) -> _dlib_pybind11.vector
        3. (self: _dlib_pybind11.face_recognition_model_v1, img: numpy.ndarray[(rows,cols,3),numpy.uint8], faces: _dlib_pybind11.full_object_detections, num_jitters: int = 0, padding: float = 0.25) -> _dlib_pybind11.vectors
        4. (self: _dlib_pybind11.face_recognition_model_v1, batch_img: List[numpy.ndarray[(rows,cols,3),numpy.uint8]], batch_faces: List[_dlib_pybind11.full_object_detections], num_jitters: int = 0, padding: float = 0.25) -> _dlib_pybind11.vectorss
        5. (self: _dlib_pybind11.face_recognition_model_v1, batch_img: List[numpy.ndarray[(rows,cols,3),numpy.uint8]], num_jitters: int = 0) -> _dlib_pybind11.vectors
    
    Invoked with: <_dlib_pybind11.face_recognition_model_v1 object at 0x7f746912a370>, array([[[0, 0, 0],
            [0, 0, 0],
            [0, 0, 0],
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    img = np.ascontiguousarray(img[:, :, ::-1])
    face_encodings = face_recognition.face_encodings(img, known_face_locations=[[10, 50, 50, 10]], model="small")
    
    • 1
    • 2

    运行正常。

    其他说明

    如果在最新机器上使用基于dlib的face_recognition库推理,同时使用tensorflow进行其他模型的推理,则tensorflow可能也会有CUDA相关异常的打印,但是溯源还是由于dlib的CUDA支持不正常引发的问题:

    2022-11-04 15:32:20.964078: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:723] failed to record completion event; therefore, failed to create inter-stream dependency
    2022-11-04 15:32:20.964146: E tensorflow/stream_executor/cuda/cuda_driver.cc:1183] failed to enqueue async memcpy from host to device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; GPU dst: 0x7f6b56e84a00; host src: 0x55bf1eaa4cc0; size: 602112=0x93000
    2022-11-04 15:32:20.964171: E tensorflow/stream_executor/stream.cc:334] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
    2022-11-04 15:32:20.964341: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
    2022-11-04 15:32:20.964352: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1
    
    • 1
    • 2
    • 3
    • 4
    • 5

    版权说明

    本文为原创文章,独家发布在blog.csdn.net/TracelessLe。未经个人允许不得转载。如需帮助请email至tracelessle@163.com或扫描个人介绍栏二维码咨询。
    在这里插入图片描述

    参考资料

    [1] dlib C++ Library - Frequently Asked Questions
    [2] davisking/dlib: A toolkit for making real world machine learning and data analysis applications in C++
    [3] How to force install Dlib with only CPU support on a GPU machine with Cuda enabled · Issue #1885 · davisking/dlib
    [4] ageitgey/face_recognition: The world’s simplest facial recognition api for Python and the command line

  • 相关阅读:
    前端开发(layui框架)
    LeetCode146.LRU缓存
    curl命令使用
    突破界限的力量:探索Facebook如何打破国界、文化和语言的障碍
    Kafka集群部署与测试
    2022-09-20 第五组 张明敏 学习笔记
    Wasabi Technologies领导团队新增日本和澳大利亚业务高管,以支持整个亚太区对热云存储的需求
    C++ 智能指针最佳实践&源码分析
    java毕业设计社区宠物管理与推荐系统
    链接Token参数可逆
  • 原文地址:https://blog.csdn.net/TracelessLe/article/details/127750183