• 42.cuBLAS开发指南中文版--cuBLAS中的Level-3函数gemm()


    2.7.1. cublasgemm()

    在这里插入图片描述

    cublasStatus_t cublasSgemm(cublasHandle_t handle,
                               cublasOperation_t transa, cublasOperation_t transb,
                               int m, int n, int k,
                               const float           *alpha,
                               const float           *A, int lda,
                               const float           *B, int ldb,
                               const float           *beta,
                               float           *C, int ldc)
    cublasStatus_t cublasDgemm(cublasHandle_t handle,
                               cublasOperation_t transa, cublasOperation_t transb,
                               int m, int n, int k,
                               const double          *alpha,
                               const double          *A, int lda,
                               const double          *B, int ldb,
                               const double          *beta,
                               double          *C, int ldc)
    cublasStatus_t cublasCgemm(cublasHandle_t handle,
                               cublasOperation_t transa, cublasOperation_t transb,
                               int m, int n, int k,
                               const cuComplex       *alpha,
                               const cuComplex       *A, int lda,
                               const cuComplex       *B, int ldb,
                               const cuComplex       *beta,
                               cuComplex       *C, int ldc)
    cublasStatus_t cublasZgemm(cublasHandle_t handle,
                               cublasOperation_t transa, cublasOperation_t transb,
                               int m, int n, int k,
                               const cuDoubleComplex *alpha,
                               const cuDoubleComplex *A, int lda,
                               const cuDoubleComplex *B, int ldb,
                               const cuDoubleComplex *beta,
                               cuDoubleComplex *C, int ldc)
    cublasStatus_t cublasHgemm(cublasHandle_t handle,
                               cublasOperation_t transa, cublasOperation_t transb,
                               int m, int n, int k,
                               const __half *alpha,
                               const __half *A, int lda,
                               const __half *B, int ldb,
                               const __half *beta,
                               __half *C, int ldc)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    此函数执行矩阵矩阵乘法

    C = α o p ( A ) o p ( B ) + β C C = \alpha op(A)op(B) + \beta C C=αop(A)op(B)+βC

    其中 α \alpha α β \beta β 是标量,A 、 B 和 C 是以列优先格式存储的矩阵,维度分别为 op(A) mxk 、 op(B) kxn 和 C mxn 。 另外,对于矩阵 A:

    o p ( A ) = { A      如 果 t r a n s a = = C U B L A S _ O P _ N , A T    如 果 t r a n s a = = C U B L A S _ O P _ T , A H    如 果 t r a n s a = = C U B L A S _ O P _ C op(A)=

    {A    transa==CUBLAS_OP_N,AT  transa==CUBLAS_OP_T,AH  transa==CUBLAS_OP_C" role="presentation">{A    transa==CUBLAS_OP_N,AT  transa==CUBLAS_OP_T,AH  transa==CUBLAS_OP_C
    op(A)=A    transa==CUBLAS_OP_N,AT  transa==CUBLAS_OP_T,AH  transa==CUBLAS_OP_C

    这里op(B)定义的是B矩阵

    Param.MemoryIn/outMeaning
    handleinputhandle to the cuBLAS library context.
    transainputOperation op(A) that is non- or (conj.) transpose.
    transbinputOperation op(B) that is non- or (conj.) transpose.
    minputNumber of rows of matrix op(A) and C.
    ninputNumber of columns of matrix op(B) and C.
    kinputNumber of columns of op(A) and rows of op(B).
    alphahost or deviceinput scalar used for multiplication.
    Adeviceinput array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.
    ldainputLeading dimension of two-dimensional array used to store the matrix A.
    Bdeviceinput array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.
    ldbinputLeading dimension of two-dimensional array used to store matrix B.
    betahost or deviceinput scalar used for multiplication. If beta==0, C does not have to be a valid input.
    Cdevicein/out array of dimensions ldc x n with ldc>=max(1,m).
    ldcinputLeading dimension of a two-dimensional array used to store the matrix C.

    该函数可能返回的错误值及其含义如下表所示:

    ErrorValueMeaning
    CUBLAS_STATUS_SUCCESS操作成功完成
    CUBLAS_STATUS_NOT_INITIALIZED库未初始化
    CUBLAS_STATUS_INVALID_VALUEIf m, n, k < 0 or if transa, transb != CUBLAS_OP_N, CUBLAS_OP_C, CUBLAS_OP_T or if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or if ldc < max(1, m) or if alpha, beta == NULL or C == NULL if C needs to be scaled
    CUBLAS_STATUS_EXECUTION_FAILED该功能无法在 GPU 上启动

    参考资料请参考:

    sgemm, dgemm, cgemm, zgemm

  • 相关阅读:
    【MATLAB源码-第81期】基于matlab的polar码三种译码算法比较(SC,SCL,BP)。
    03 LaTex之标题页&摘要
    Webpack配置entry修改入口文件或打包多个文件
    微信机器人开发
    spring mvc上传文件MultipartHttpServletRequest值为空
    [Django 0-1] Apps模块
    函数的用法
    基于JAVA后台微信校园疫情防控小程序系统 开题报告
    在win10上格式化Linux启动盘
    Linux 下的 OOM Killer理解Out of memory: Kill process
  • 原文地址:https://blog.csdn.net/kunhe0512/article/details/128145108