matrix-multiplication - 在 alea gpu 上使用 cuBLAS 进行矩阵乘法

Question

我正在尝试在 Alea GPU 上使用 Gemm 进行矩阵乘法，但是，此代码给出了错误的结果。

Gpu gpu = Gpu.Default;
Blas blas = new Blas(gpu);

int m=2,n=3;    //in dimension and out dimension (output will be mxn matrix)
int k=4;

//column major
float[,] A = new float[4,2] { {100,200},{2,6},{3,7},{4,8} };    //2x4 matrix
float[,] B = new float[3,4] { {1,4,7,10}, {2,5,8,11}, {3,6,9,12} }; //4x3 matrix
float[,] C = new float[3,2] { {-1,-1}, {-1,-1}, {-1,-1}  }; //2x3 matrix

var dA = gpu.AllocateDevice<float>(A);  
var dB = gpu.AllocateDevice<float>(B);  
var dC = gpu.AllocateDevice<float>(C);

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,m,dB.Ptr,k,0f,dC.Ptr,m);

var result = Gpu.Copy2DToHost(dC);

这是我得到的结果。它只是从矩阵 A 复制一些数字。矩阵 C 中的一些数字不会从初始化中改变。

100 -1 -1
200 -1 -1

代码有什么问题吗？请帮忙。

我正在使用 alea 3.0.3 和 cuda 工具包 8.0。

UPDATE1：我发现当我将 A、B、C 矩阵展平为一维数组时，它会给出正确的结果。但是，仍然想知道二维数组有什么问题。

score 1 · Accepted Answer

我发现 2D-Array 的 gpu.AllocateDevice 不会像在 CPU 上那样在 GPU 上分配空间。任何 2 个连续列（间距）的第一个元素之间的距离非常大。

因此，必须更改前导尺寸参数。

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,dA.PitchInElements.ToInt32(),dB.Ptr,dB.PitchInElements.ToInt32(),0f,dC.Ptr,dC.PitchInElements.ToInt32());

现在，我得到了正确的结果。但是，是否有任何文件显示 GPU 上的 2D 阵列分配如何在 Alea 中真正起作用的细节？

我只能看到http://www.aleagpu.com/release/3_0_3/api/html/6f0dc687-7191-91ba-6c30-bb379dded567.htm没有解释。

matrix-multiplication - 在 alea gpu 上使用 cuBLAS 进行矩阵乘法

1 回答 1

Related

Reference