• OpenCL线程代数库ViennaCL的使用


    ViennaCL

    介绍

    http://viennacl.sourceforge.net/
    ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).

    ViennaCL是一个免费的开源线性代数库,用于多核架构(gpu, MIC)和多核cpu的计算。该库是用c++编写的,支持CUDA、OpenCL和OpenMP(包括运行时的开关)。

    最新1.7版本的亮点。X发行系列是:

    • Fast sparse matrix-matrix multiplications, outperforming CUBLAS and MKL.
    • 快速稀疏矩阵-矩阵乘法,优于CUBLAS和MKL。
    • Fine-grained parallel algebraic multigrid preconditioners for CPUs, Xeon Phis, and GPUs.
    • 细粒度并行代数多网格预处理器的cpu, Xeon phi协处理器和gpu。
    • Fine-grained parallel incomplete LU factorization preconditioners for CPUs, Xeon Phis, and GPUs.
    • 细粒度并行不完全LU分解预处理器,用于cpu, Xeon phi协处理器和gpu。
      在这里插入图片描述

    下载

    http://viennacl.sourceforge.net/viennacl-download.html
    下载连接如上,请有需要的朋友可以可以使用对应的相关的版本。
    ViennaCL-1.7.1.tar.gz

    编译

    • 将下载好的源码解压
    unzip ViennaCL-1.7.1.zip
    
    • 1

    会生成如下目录,进入build

    zacha@Superman:ViennaCL-1.7.1$ ls
    build      CL     CMakeLists.txt  examples  libviennacl  README  viennacl
    changelog  cmake  doc             external  LICENSE      tests
    
    • 1
    • 2
    • 3

    执行cmake-gui ../ 然后点击Configure。
    这时候会自动的在usr目录下寻找OpenCL 编译环境所需的头文件和库文件(libOpenCL.so)。如果没有请安装对应的显卡驱动相关的SDK, 如果是嵌入式板卡,请选择交叉编译,然后指定但对于所需的opencl的头文件和库文件。
    在这里插入图片描述
    制定好头文件路径和库文件路径,点击Generate。生成Makefile,然后执行make
    或者参考build/README.txt
    等待编译成功,移至对应平台即可。

    使用

    我们看到:
    在编译完的代码examples下会有很多可执行的示例,供用户使用。

    benckmarks

    #./dense_blas-bench-opencl
    ----------------------------------------------
                   Device Info
    ----------------------------------------------
    
    Vendor:              Vivante Corporation
    Type:                GPU 
    Available:           1
    Max Compute Units:   1
    Max Work Group Size: 1024
    Global Mem Size:     268435456
    Local Mem Size:      32768
    Local Mem Type:      2
    Host Unified Memory: 1
    
    Benchmark : BLAS
    ----------------
    sCOPY : 2 GB/s
    sAXPY : 2 GB/s
    sDOT : 4 GB/s
    sGEMV-N : 1.97 GB/s
    sGEMV-T : 1.48 GB/s
    sGEMM-NN : 0.246 GFLOPs/s
    sGEMM-NT : 0.246 GFLOPs/s
    sGEMM-TN : 0.246 GFLOPs/s
    sGEMM-TT : 0.246 GFLOPs/s
    ----
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    
    Benchmark : BLAS
    ----------------
    sCOPY : 0.195 GB/s
    sAXPY : 0.196 GB/s
    sDOT : 0.171 GB/s
    sGEMV-N : 0.0303 GB/s
    sGEMV-T : 0.0517 GB/s
    sGEMM-NN : 0.00863 GFLOPs/s
    sGEMM-NT : 0.234 GFLOPs/s
    sGEMM-TN : 0.00868 GFLOPs/s
    sGEMM-TT : 0.00863 GFLOPs/s
    ----
    dCOPY : 0.381 GB/s
    dAXPY : 0.377 GB/s
    dDOT : 0.327 GB/s
    dGEMV-N : 0.0596 GB/s
    dGEMV-T : 0.101 GB/s
    dGEMM-NN : 0.00797 GFLOPs/s
    dGEMM-NT : 0.00774 GFLOPs/s
    dGEMM-TN : 0.00821 GFLOPs/s
    dGEMM-TT : 0.00797 GFLOPs/s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    #./opencl-bench-opencl
    ----------------------------------------------
                   Device Info
    ----------------------------------------------
    Name:                Vivante OpenCL Device VIP8000-OI.8102.0000
    Vendor:              Vivante Corporation
    Type:                GPU 
    Available:           1
    Ma[ 5243.331300] VIP8000 SetPower 0 
    x Compute Units:		 1
    Max Work Group Size: 1024
    Global Mem Size:     268435456
    Local Mem Size:      32768
    Local Mem Type:      2
    Host Unified Memory: 1
    
    
    ----------------------------------------------
    ----------------------------------------------
    ## Benchmark :: OpenCL performance
    ----------------------------------------------
    
       -------------------------------
       # benchmarking single-precision
       -------------------------------
    Time for building scalar kernels: 4e-06
    Time for building vector kernels: 1.446
    Time for building matrix kernels: 2.98157
    Time for building compressed_matrix kernels: 1.88953
    Time for 100000 entry accesses on host: 0.004118
    Time per entry: 4.118e-08
    Result of operation on host: 104839
    Time for 100000 entry accesses via OpenCL: 35.0961
    Time per entry: 0.000350961
    Result of operation via OpenCL: 104839
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35

    bandwidth-reduction

    #./bandwidth-reduction
    -- Generating matrix --
     * Unknowns: 262144
     * Initial bandwidth: 8192
     * Randomly reordered bandwidth: 262051
    -- Cuthill-McKee algorithm --
     * Reordered bandwidth: 6207
    -- Advanced Cuthill-McKee algorithm --
     * Reordered bandwidth: 6207
    -- Gibbs-Poole-Stockmeyer algorithm --
     * Reordered bandwidth: 6207
    !!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    fft

    Computing FFT Matrix
    m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
    o: [4,8]((0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0),(0,0,0,0,0,0,0,0))
    Done
    m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
    o: [4,8]((32,40,-16,0,-8,-8,-9.53674e-07,-16),(-8,0,0,0,0,0,0,0),(0,-8,0,0,0,0,0,0),(-4.76837e-07,-8,0,0,0,0,0,0))
    Transpose
    m: [4,8]((0,0,1,1,2,2,3,3),(0,1,1,2,2,3,3,4),(1,1,2,2,3,3,4,4),(1,2,2,3,3,4,4,5))
    o: [4,8]((0,0,0,1,1,1,1,2),(1,1,1,2,2,2,2,3),(2,2,2,3,3,3,3,4),(3,3,3,4,4,4,4,5))
    ---------------------
    Computing FFT bluestein
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    Done
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    output_vec: [16](28,2.38419e-07,-4,9.65685,-4,4,-4,1.65685,-4,-3.2981e-07,-4,-1.65685,-4,-4,-4,-9.65685)
    ---------------------
    Computing FFT 
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    Done
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    output_vec: [16](28,0,-4,9.65685,-4,4,-4,1.65685,-4,0,-4,-1.65685,-4,-4,-4,-9.65685)
    ---------------------
    Computing inverse FFT...
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    output_vec: [16](0,0,1,4.56956e-08,2,-2.78181e-08,3,-1.64905e-07,4,0,5,-7.35137e-08,6,2.78181e-08,7,1.92723e-07)
    ---------------------
    Computing real to complex...
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    output_vec: [16](0,0,0,0,1,0,0,0,2,0,0,0,3,0,0,0)
    ---------------------
    Computing complex to real...
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    output_vec: [16](0,1,2,3,4,5,6,7,2,0,0,0,3,0,0,0)
    ---------------------
    Computing multiply complex
    input_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    input2_vec: [16](0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0)
    Done
    output_vec: [16](0,0,1,0,4,0,9,0,16,0,25,0,36,0,49,0)
    ---------------------
    !!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    还有很多示例可以编译,大家可以自行探索。
    在这里插入图片描述

  • 相关阅读:
    Linux入门之SysVinit
    Ubuntu openKylin 安装open VMware tool 工具
    Ubuntu 优化 与 问题记录
    聚力打造四个“高地”,携手合作伙伴共铸国云
    un7.29:Linux——centos7防火墙开放端口及常用命令。
    数据结构与算法7-递归、分治、回溯
    学习记忆——宫殿篇——记忆宫殿——地点桩——演讲稿定位记忆
    10.8队列安排,最少找字典次数,表达式转换与计算模拟(栈、队列)
    第十九次CCF计算机软件能力认证
    详细安装node.js管理工具nvm,以及对应版本的npm(npm6.x)过程中遇到的问题
  • 原文地址:https://blog.csdn.net/qq_38505858/article/details/125480883