• 算能RISC-V通用云开发空间编译pytorch @openKylin留档


    终于可以体验下risc-v了! 操作系统是openKylin,算能的云空间

    尝试编译安装pytorch

    首先安装git

    apt install git

    然后下载pytorch和算能cpu的库:

    git clone https://github.com/sophgo/cpuinfo.git

    git clone https://github.com/pytorch/pytorch

    注意事项:

    1. cd pytorch
    2. # 确保子模块的远程仓库URL与父仓库中的配置一致
    3. git submodule sync
    4. # 确保获取并更新所有子模块的内容,包括初始化尚未初始化的子模块并递归地处理嵌套的子模块
    5. git submodule update --init --recursive

    将pytorch/third-parth目录的cpuinfo删除,换成算能的cpu库cpuinfo

    cd pytorch

    rm -rf cpuinfo

    cp -rf ../cpuinfo .

    安装相关库

    apt install libopenblas-dev 报错,可以跳过

    apt install libblas-dev m4 cmake cython3 ccache

    手工编译安装openblas

    1. git clone https://github.com/xianyi/OpenBLAS.git
    2. cd OpenBLAS
    3. make -j8
    4. make PREFIX=/usr/local/OpenBLAS install

    编译的时候是一堆warning啊

    在/etc/profile最后一行添加:

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/OpenBLAS/lib/
    

    并执行:source  /etc/profile

    修改代码

    到pytorch目录,执行: vi aten/src/ATen/CMakeLists.txt

        aten/src/ATen/CMakeLists.txt

    将语句:if(NOT MSVC AND NOT EMSCRIPTEN AND NOT INTERN_BUILD_MOBILE)
    替换为:if(FALSE)

       vi caffe2/CMakeLists.txt

    将语句:target_link_libraries(${test_name}_${CPU_CAPABILITY} c10 sleef gtest_main)
    替换为:target_link_libraries(${test_name}_${CPU_CAPABILITY} c10 gtest_main)

       vi  test/cpp/api/CMakeLists.txt

    在语句下:add_executable(test_api ${TORCH_API_TEST_SOURCES})
    添加:target_compile_options(test_api PUBLIC -Wno-nonnull)

    环境变量配置

    # 直接在终端中输入即可,重启需要重新输入
    export USE_CUDA=0
    export USE_DISTRIBUTED=0
    export USE_MKLDNN=0
    export MAX_JOBS=16

    配置原文链接:https://blog.csdn.net/m0_49267873/article/details/135670989

    编译安装

    执行:

    python3 setup.py develop --cmake

    或者python3.10 setup.py install

    据说要gcc 13以上,自带的gcc版本:

    gcc version 9.3.0 (Openkylin 9.3.0-ok12)

    需要打patch:

    # 若提示无patchelf命令,则执行下列语句
    apt install patchelf

    # path为存放libtorch_cpu.so的路径
    patchelf --add-needed libatomic.so.1 /path/libtorch_cpu.so
     

    对算能云的系统来说,命令为:patchelf --add-needed libatomic.so.1  /root/pytorch/build/lib/libtorch_cpu.so

    编译前的准备

    编译前还需要安装好这两个库:

    pip3 install pyyaml typing_extensions

    另外还要升级setuptools

    pip3 install setuptools -U

    最终编译完成

    在pytorch目录执行:

    python3 setup.py develop --cmake

    整个编译过程大约需要3-4个小时

    最终编译完成:

    Installed /usr/lib/python3.8/site-packages/mpmath-1.3.0-py3.8.egg
    Searching for typing-extensions==4.9.0
    Best match: typing-extensions 4.9.0
    Adding typing-extensions 4.9.0 to easy-install.pth file
    detected new path './mpmath-1.3.0-py3.8.egg'

    Using /usr/local/lib/python3.8/dist-packages
    Finished processing dependencies for torch==2.3.0a0+git5c5b71b

    测试

    进入python3,执行import pytorch,报错没有pytorch。 执行import torch

    看到没有报错,以为测试通过。其实是因为在pytorch目录,有子目录torch,误以为pass了

    是我唐突了,因为使用的develop模式,就是这样用。

    也就是必须在pytorch的目录,这样才能识别为develop的torch,在~/pytorch目录,执行python3,在命令交互方式下,把下面这段代码cp进去执行,测试通过

    1. import torch
    2. import torch.nn as nn
    3. import torch.optim as optim
    4. import os
    5. os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
    6. N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
    7. x = torch.randn(N,D_in) # x = np.random.randn(N,D_in)
    8. y = torch.randn(N,D_out) # y = np.random.randn(N,D_out)
    9. w1 = torch.randn(D_in,H) # w1 = np.random.randn(D_in,H)
    10. w2 = torch.randn(H,D_out) # w2 = np.random.randn(H,D_out)
    11. learning_rate = 1e-6
    12. for it in range(200):
    13. # forward pass
    14. h = x.mm(w1) # N * H h = x.dot(w1)
    15. h_relu = h.clamp(min=0) # N * H np.maximum(h,0)
    16. y_pred = h_relu.mm(w2) # N * D_out h_relu.dot(w2)
    17. # compute loss
    18. loss = (y_pred - y).pow(2).sum() # np.square(y_pred-y).sum()
    19. print(it,loss.item()) # print(it,loss)
    20. # BP - compute the gradient
    21. grad_y_pred = 2.0 * (y_pred-y)
    22. grad_w2 = h_relu.t().mm(grad_y_pred) # h_relu.T.dot(grad_y_pred)
    23. grad_h_relu = grad_y_pred.mm(w2.t()) # grad_y_pred.dot(w2.T)
    24. grad_h = grad_h_relu.clone() # grad_h_relu.copy()
    25. grad_h[h<0] = 0
    26. grad_w1 = x.t().mm(grad_h) # x.T.dot(grad_h)
    27. # update weights of w1 and w2
    28. w1 -= learning_rate * grad_w1
    29. w2 -= learning_rate * grad_w2
    1. 0 29870438.0
    2. 1 26166322.0
    3. 2 25949932.0
    4. 3 25343224.0
    5. 4 22287072.0
    6. 5 16840522.0
    7. 6 11024538.0
    8. 7 6543464.5
    9. 8 3774165.25
    10. 9 2248810.5
    11. 10 1440020.25
    12. 11 1001724.5
    13. 12 749632.625
    14. 13 592216.6875
    15. 14 485451.34375
    16. 15 407586.65625
    17. 16 347618.4375
    18. 17 299686.625
    19. 18 260381.9375
    20. 19 227590.734375

    怎样全环境可以用torch呢?

    感觉是环境变量的问题,敬请期待

    调试

    安装libopenblas-dev报错

    root@863c89a419ec:~/pytorch/third_party# apt install libopenblas-dev
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    Package libopenblas-dev is not available, but is referred to by another package.
    This may mean that the package is missing, has been obsoleted, or
    is only available from another source

    竟然有人已经过了这个坑,可以跳过它,用编译安装openblas代替

    编译pytorch的时候报错

    python3 setup.py develop --cmake

    Building wheel torch-2.3.0a0+git5c5b71b
    -- Building version 2.3.0a0+git5c5b71b
    Could not find any of CMakeLists.txt, Makefile, setup.py, LICENSE, LICENSE.md, LICENSE.txt in /root/pytorch/third_party/pybind11
    Did you run 'git submodule update --init --recursive'?

    进入third_parth目录执行下面命令解决:

    rm -rf pthreadpool
    # 执行下列指令前回退到pytorch目录
    git submodule update --init --recursive

    执行完还是报错:

    root@863c89a419ec:~/pytorch# python3 setup.py develop --cmake
    Building wheel torch-2.3.0a0+git5c5b71b
    -- Building version 2.3.0a0+git5c5b71b
    Could not find any of CMakeLists.txt, Makefile, setup.py, LICENSE, LICENSE.md, LICENSE.txt in /root/pytorch/third_party/QNNPACK
    Did you run 'git submodule update --init --recursive'?

    再次执行命令 git submodule update --init --recursive 照旧。

    将QNNPACK目录删除,再执行一遍 git submodule update --init --recursive ,过了。

    报错RuntimeError: Missing build dependency: Unable to `import yaml`.

    python3 install pyyaml

    报错:ModuleNotFoundError: No module named 'typing_extensions'

    python3 install typing_extensions 搞定。

    编译到78%报错

    /usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
    collect2: error: ld returned 1 exit status
    make[2]: *** [caffe2/CMakeFiles/NamedTensor_test.dir/build.make:101: bin/NamedTensor_test] Error 1
    make[1]: *** [CMakeFiles/Makefile2:3288: caffe2/CMakeFiles/NamedTensor_test.dir/all] Error 2
    /usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
    collect2: error: ld returned 1 exit status
    make[2]: *** [caffe2/CMakeFiles/cpu_profiling_allocator_test.dir/build.make:101: bin/cpu_profiling_allocator_test] Error 1
    make[1]: *** [CMakeFiles/Makefile2:3505: caffe2/CMakeFiles/cpu_profiling_allocator_test.dir/all] Error 2
    [ 78%] Linking CXX executable ../bin/cpu_rng_test
    /usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
    collect2: error: ld returned 1 exit status
    make[2]: *** [caffe2/CMakeFiles/cpu_rng_test.dir/build.make:101: bin/cpu_rng_test] Error 1
    make[1]: *** [CMakeFiles/Makefile2:3536: caffe2/CMakeFiles/cpu_rng_test.dir/all] Error 2
    make: *** [Makefile:146: all] Error 2

    初步怀疑是cpu库有问题。看cpu库,没问题。

    试试这个办法:

    问题分析:对__atomic_exchange_1的未定义引用

    解决方法:使用patchelf添加需要的动态库

    # 若提示无patchelf命令,则执行下列语句
    apt install patchelf

    # path为存放libtorch_cpu.so的路径
    patchelf --add-needed libatomic.so.1 /path/libtorch_cpu.so
     

    存放libtorch_cpu.so的路径:/root/pytorch/build/lib/libtorch_cpu.so

    因此命令为:patchelf --add-needed libatomic.so.1 /root/pytorch/build/lib/libtorch_cpu.so

    果然运行完这条命令后,编译就能继续下去了。

    编译100%报错

    running develop
    /usr/lib/python3/dist-packages/setuptools/command/easy_install.py:146: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    Traceback (most recent call last):
      File "setup.py", line 1401, in
        main()
      File "setup.py", line 1346, in main
        setup(
      File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 973, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 991, in run_command
        cmd_obj.ensure_finalized()
      File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 109, in ensure_finalized
        self.finalize_options()
      File "/usr/lib/python3/dist-packages/setuptools/command/develop.py", line 52, in finalize_options
        easy_install.finalize_options(self)
      File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 231, in finalize_options
        self.config_vars = dict(sysconfig.get_config_vars())
    UnboundLocalError: local variable 'sysconfig' referenced before assignment

    尝试升级setuptools试试

    root@863c89a419ec:~# pip3 install  setuptools -U
    Collecting setuptools
      Using cached setuptools-69.1.0-py3-none-any.whl (819 kB)
    Installing collected packages: setuptools
      Attempting uninstall: setuptools
        Found existing installation: setuptools 65.3.0
        Not uninstalling setuptools at /usr/lib/python3/dist-packages, outside environment /usr
        Can't uninstall 'setuptools'. No files were found to uninstall.
    Successfully installed setuptools-69.1.0
    然后再次编译,过了!

    查看gcc版本

    据说要gcc 13以上,自带的gcc版本:

    gcc version 9.3.0 (Openkylin 9.3.0-ok12)

    gcc version 9.3.0 (Openkylin 9.3.0-ok12)

  • 相关阅读:
    AD623参考引脚5仿真
    Python武器库开发-基础篇(三)
    选择篇(065)-下面代码的输出是什么?
    深度合成算法的基础与原理
    昇思25天学习打卡营第5天|数据变换 Transforms
    Redis的非关系型数据库
    ARM+FPGA医疗图像处理解决方案
    Baumer工业相机堡盟工业相机如何通过BGAPI SDK设置相机的图像剪切(ROI)功能(C++)
    Java:Java 仍然很棒的7个原因
    进程与线程
  • 原文地址:https://blog.csdn.net/skywalk8163/article/details/136240118