Step1 安装visual studio,然后安装新版本的cuda,比如cuda11
Step2 安装pycuda, 命令如下:pip install pycuda
Step3 运行以下例子:
import pycuda import pycuda.autoinit import pycuda.driver as drv import numpy from pycuda.compiler import SourceModule ### 防止找不到c语言编译器 import os if (os.system("cl.exe")): os.environ['PATH'] += ';'+r"D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64" if (os.system("cl.exe")): raise RuntimeError("cl.exe still not found, path probably incorrect") ##### 具体乘法模块 mod = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } """) multiply_them = mod.get_function("multiply_them") a = numpy.random.randn(400).astype(numpy.float32) b = numpy.random.randn(400).astype(numpy.float32) dest = numpy.zeros_like(a) multiply_them(drv.Out(dest), drv.In(a), drv.In(b), block=(400,1,1), grid=(1,1)) #### 测试结果与cpu计算结果比较 print(dest - a * b)
当然以上代码只是一个简单的例子,具体需要更加精确的Grid Block Thread个数设计,比如1000万长度的列向量相乘最好写成以下形式:
mod2 = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b, int data_length)