• 41.cuBLAS开发指南中文版--cuBLAS中的Level-2gemvBatched()


    2.6.24. cublasgemvBatched()

    在这里插入图片描述

    cublasStatus_t cublasSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                      int m, int n,
                                      const float           *alpha,
                                      const float           *Aarray[], int lda,
                                      const float           *xarray[], int incx,
                                      const float           *beta,
                                      float           *yarray[], int incy,
                                      int batchCount)
    cublasStatus_t cublasDgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                      int m, int n,
                                      const double          *alpha,
                                      const double          *Aarray[], int lda,
                                      const double          *xarray[], int incx,
                                      const double          *beta,
                                      double          *yarray[], int incy,
                                      int batchCount)
    cublasStatus_t cublasCgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                      int m, int n,
                                      const cuComplex       *alpha,
                                      const cuComplex       *Aarray[], int lda,
                                      const cuComplex       *xarray[], int incx,
                                      const cuComplex       *beta,
                                      cuComplex       *yarray[], int incy,
                                      int batchCount)
    cublasStatus_t cublasZgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                      int m, int n,
                                      const cuDoubleComplex *alpha,
                                      const cuDoubleComplex *Aarray[], int lda,
                                      const cuDoubleComplex *xarray[], int incx,
                                      const cuDoubleComplex *beta,
                                      cuDoubleComplex *yarray[], int incy,
                                      int batchCount)
    cublasStatus_t cublasHSHgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                        int m, int n,
                                        const float           *alpha,
                                        const __half          *Aarray[], int lda,
                                        const __half          *xarray[], int incx,
                                        const float           *beta,
                                        __half                *yarray[], int incy,
                                        int batchCount)
    cublasStatus_t cublasHSSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                        int m, int n,
                                        const float           *alpha,
                                        const __half          *Aarray[], int lda,
                                        const __half          *xarray[], int incx,
                                        const float           *beta,
                                        float                 *yarray[], int incy,
                                        int batchCount)
    cublasStatus_t cublasTSTgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                        int m, int n,
                                        const float           *alpha,
                                        const __nv_bfloat16   *Aarray[], int lda,
                                        const __nv_bfloat16   *xarray[], int incx,
                                        const float           *beta,
                                        __nv_bfloat16         *yarray[], int incy,
                                        int batchCount)
    cublasStatus_t cublasTSSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,
                                        int m, int n,
                                        const float           *alpha,
                                        const __nv_bfloat16   *Aarray[], int lda,
                                        const __nv_bfloat16   *xarray[], int incx,
                                        const float           *beta,
                                        float                 *yarray[], int incy,
                                        int batchCount)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65

    此函数执行一批矩阵和向量的矩阵向量乘法。 该批处理被认为是“统一的”,即所有实例对于它们各自的 A 矩阵、x 和 y 向量具有相同的维度 (m, n)、前导维度 (lda)、增量 (incx, incy) 和转置 (trans) . 输入矩阵和向量的地址,以及批处理的每个实例的输出向量,都是从调用者传递给函数的指针数组中读取的。

    y [ i ] = α o p ( A [ i ] ) x [ i ] + β y [ i ] , f o r i ∈ [ 0. b a t c h C o u n t − 1 ] y[i] = \alpha op(A[i])x[i] + \beta y[i], for i\in [0. batchCount-1] y[i]=αop(A[i])x[i]+βy[i],fori[0.batchCount1]

    其中 α \alpha α β \beta β是标量, A 是指向矩阵 A[i] 的指针数组,以列优先格式存储,维度为 m x n ,x 和 y 是指向向量的指针数组。 此外,对于 matrixA[i] ,

    o p ( A [ i ] ) = { A [ i ]      如 果 t r a n s a = = C U B L A S _ O P _ N , A [ i ] T    如 果 t r a n s a = = C U B L A S _ O P _ T , A [ i ] H    如 果 t r a n s a = = C U B L A S _ O P _ C op(A[i])= {A[i]    transa==CUBLAS_OP_N,A[i]T  transa==CUBLAS_OP_T,A[i]H  transa==CUBLAS_OP_C op(A[i])=A[i]    transa==CUBLAS_OP_N,A[i]T  transa==CUBLAS_OP_T,A[i]H  transa==CUBLAS_OP_C
    注意:y[i] 向量不能重叠,也就是说,各个 gemv 操作必须是可独立计算的; 否则,会出现未定义的行为。

    对于某些规模的问题,在不同的 CUDA 流中多次调用 cublasgemv 可能比使用此 API 更有利。

    Param.MemoryIn/outMeaning
    handleinputhandle to the cuBLAS library context.
    transinputOperation op(A[i]) that is non- or (conj.) transpose.
    minputNumber of rows of matrix A[i].
    ninputnumber of columns of matrix A.
    alphahost or deviceinput scalar used for multiplication.
    AarraydeviceinputArray of pointers to array, with each array of dim. lda x n with lda>=max(1,m).All pointers must meet certain alignment criteria. Please see below for details.
    ldainputLeading dimension of two-dimensional array used to store each matrix A[i].
    xarraydeviceinputArray of pointers to array, with each dimension n if trans==CUBLAS_OP_N and m otherwise.All pointers must meet certain alignment criteria. Please see below for details.
    incxinputstride between consecutive elements of x.
    betahost or deviceinput scalar used for multiplication. If beta == 0, y does not have to be a valid input.
    yarraydevicein/outArray of pointers to array. It has dimensions m if trans==CUBLAS_OP_N and n otherwise. Vectors y[i] should not overlap; otherwise, undefined behavior is expected.All pointers must meet certain alignment criteria. Please see below for details.
    incyinputStride of each one-dimensional array y[i].
    batchCountinputNumber of pointers contained in Aarray, xarray and yarray.

    如果数学模式在使用 cublasSgemvBatched() 时启用快速数学模式,则放置在 GPU 内存中的指针(不是指针数组)必须正确对齐以避免未对齐的内存访问错误。 理想情况下,所有指针都对齐到至少 16 字节。 否则建议他们满足以下规则:

    • if k%4==0 then ensure intptr_t(ptr) % 16 == 0,

    该函数可能返回的错误值及其含义如下表所示:

    ErrorValueMeaning
    CUBLAS_STATUS_SUCCESS操作成功完成
    CUBLAS_STATUS_NOT_INITIALIZED库未初始化
    CUBLAS_STATUS_INVALID_VALUE参数 m,n,batchCount<0 .
    CUBLAS_STATUS_EXECUTION_FAILED该功能无法在 GPU 上启动
  • 相关阅读:
    【工具】精通Chrome浏览器:Windows和Mac的快捷键指南
    java8概要
    Kubernetes为什么会赢
    python入门练习题(含C++解法)
    Allavsoft Video Downloader Converter Mac(视频下载工具)
    Bug记录:【com.fasterxml.jackson.databind.exc.InvalidDefinitionException】
    H5页面跳转微信小程序时:wx.miniProgram.navigateTo 报错 ‘wx‘ is not defined no-undef
    镜面不锈钢氮气柜主要功能和应用领域介绍
    ubuntu16.04上安装gstreamer
    花了 3000 美元,我在 SaaStr 大会学到了什么?——码农驱动的 SaaS 增长之路
  • 原文地址:https://blog.csdn.net/kunhe0512/article/details/128126599