我正在使用 AleaGPU 库来执行矩阵乘法和类似的操作,我似乎无法理解为什么我的代码没有按预期工作。
“未按预期工作”是指生成的矩阵的第一行(或前几行)具有正确的值,其余行都用 0 填充,使用的代码与我在下面的其他代码示例。
功能#1(不起作用):由于某种原因,这个不起作用,它具有上述行为。听起来我混淆了索引,但我看不出下面三个示例的代码有任何区别,而且我没有收到任何错误(AleaGPU 在尝试访问无效数组时通常会抛出异常位置)。
public static double[,] Multiply([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
// Checks
if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");
// Initialize the parameters and the result matrix
int h = m1.GetLength(0);
int w = m2.GetLength(1);
int l = m1.GetLength(1);
// Execute the multiplication in parallel
using (DeviceMemory2D<double> m1_device = Gpu.Default.AllocateDevice(m1))
using (DeviceMemory2D<double> m2_device = Gpu.Default.AllocateDevice(m2))
using (DeviceMemory2D<double> mresult_device = Gpu.Default.AllocateDevice<double>(h, w))
{
// Pointers setup
deviceptr<double>
pm1 = m1_device.Ptr,
pm2 = m2_device.Ptr,
pmresult = mresult_device.Ptr;
// Local wrapper function
void Kernel(int ki)
{
// Calculate the current indexes
int
i = ki / w,
j = ki % w;
// Perform the multiplication
double sum = 0;
int im1 = i * l;
for (int k = 0; k < l; k++)
{
// m1[i, k] * m2[k, j]
sum += pm1[im1 + k] * pm2[k * w + j];
}
pmresult[i * w + j] = sum; // result[i, j]
}
// Get the pointers and iterate fo each row
Gpu.Default.For(0, h * w, Kernel);
// Return the result
return Gpu.Copy2DToHost(mresult_device);
}
}
我看了这个代码几个小时试图检查每一行,但我真的看不出它有什么问题。
这工作得很好,但我看不出与第一个的区别
public static double[,] MultiplyGpuManaged([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
// Checks
if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");
// Initialize the parameters and the result matrix
int h = m1.GetLength(0);
int w = m2.GetLength(1);
int l = m1.GetLength(1);
double[,]
m1_gpu = Gpu.Default.Allocate(m1),
m2_gpu = Gpu.Default.Allocate(m2),
mresult_gpu = Gpu.Default.Allocate<double>(h, w);
// Execute the multiplication in parallel
Gpu.Default.For(0, h * w, index =>
{
// Calculate the current indexes
int
i = index / w,
j = index % w;
// Perform the multiplication
double sum = 0;
for (int k = 0; k < l; k++)
{
sum += m1_gpu[i, k] * m2_gpu[k, j];
}
mresult_gpu[i, j] = sum;
});
// Free memory and copy the result back
Gpu.Free(m1_gpu);
Gpu.Free(m2_gpu);
double[,] result = Gpu.CopyToHost(mresult_gpu);
Gpu.Free(mresult_gpu);
return result;
}
这也很好用,我做了这个额外的测试来检查我是否弄乱了第一个函数中的索引(显然它们很好)
public static double[,] MultiplyOnCPU([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
// Checks
if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");
// Initialize the parameters and the result matrix
int h = m1.GetLength(0);
int w = m2.GetLength(1);
int l = m1.GetLength(1);
double[,] result = new double[h, w];
Parallel.For(0, h * w, index =>
{
unsafe
{
fixed (double* presult = result, pm1 = m1, pm2 = m2)
{
// Calculate the current indexes
int
i = index / w,
j = index % w;
// Perform the multiplication
double sum = 0;
int im1 = i * l;
for (int k = 0; k < l; k++)
{
sum += pm1[im1 + k] * pm2[k * w + j];
}
presult[i * w + j] = sum;
}
}
});
return result;
}
我真的不明白我在第一种方法中缺少什么,我不明白为什么它不起作用。
预先感谢您的帮助!