Step1 安装visual studio,然后安装新版本的cuda,比如cuda11
Step2 安装pycuda, 命令如下:pip install pycuda
Step3 运行以下例子:
import pycuda
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
### 防止找不到c语言编译器
import os
if (os.system("cl.exe")):
os.environ['PATH'] += ';'+r"D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64"
if (os.system("cl.exe")):
raise RuntimeError("cl.exe still not found, path probably incorrect")
##### 具体乘法模块
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(drv.Out(dest), drv.In(a), drv.In(b), block=(400,1,1), grid=(1,1))
#### 测试结果与cpu计算结果比较
print(dest - a * b)
当然以上代码只是一个简单的例子,具体需要更加精确的Grid Block Thread个数设计,比如1000万长度的列向量相乘最好写成以下形式:
mod2 = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b, int data_length)