c - 压缩列存储中的稀疏矩阵与列向量的乘法

Question

我必须将压缩列存储中的稀疏矩阵与列向量相乘（我必须在开放 cl 中将其并行化）。我在互联网上搜索了很多天。花了这么多天但找不到任何东西。（我被允许搜索互联网因为我必须将其转换为并行）。但我只能找到压缩行存储的代码。

spmv_csr_serial(const int num_rows ,
                const int * ptr ,
                const int * indices ,
                const float * data ,
                const float * x,
                float * y)
{
    for(int row = 0; i < num_rows; i++){
        float dot = 0;
        int row_start = ptr[row];
        int row_end = ptr[row+1];

        for (int jj = row_start; jj < row_end; jj++)
            dot += data[jj] * x[indices[jj]];

        y[row] += dot;
    }
}

压缩列存储没有行 ptr。那么如何将它与向量相乘呢？我只需要串行代码，我会自己将其转换为并行代码。

这是我的这个项目的 OpenCL 内核

enter code here
__kernel void mykernel(__global const int* val,__global const int* index,__global const int * ptr,__global const int* x,__global int* y) 
{ 
    int id=get_global_id(0); 
    int colstart=ptr[id]; 
    int colend=ptr[id+1]; 
    for(int j=colstart;j<colend;j++) 
    { 
        y[index[j]]=val[j]*x[index[j]]; 
    } 
}

此代码在 open cl 内核中返回垃圾值。这是我的序列号。

   spmv_csr_serial(const int num_rows ,
                const int * ptr ,
                const int * indices ,
                const float * data ,
                const float * x,
                float * y)
{
    for(int row = 0; i < num_rows; i++){
        float dot = 0;
        int colstart = ptr[row];
        int colend = ptr[row+1];

      for(int j=colstart;j<colend;j++) 
    { 
        y[index[j]]=val[j]*x[index[j]]; 
    }

    }
}

密集矩阵向量乘法算法

For(int i=0;i<A.RowLength;i++) 
{
    For(int j=0;j<vector.length;j++) 
    { 
        Result[i]=Result[i]+A[i][j]*vector[j];
    }
}

score 5 · Accepted Answer

一般来说，进行矩阵向量计算的算法如下

y = 0
for i = 0 : Nr - 1
    for j = 0 : Nc - 1
        y[i] += M[i,j] * x[j]

压缩行存储：

我们不是对所有列进行普通循环，而是仅对非零条目进行循环：

y = 0
for i = 0 : Nr - 1
    for j = 0 : numElementsInRow(i) - 1
        y[i] += M[i, columnIndex(i,j)] * x[columnIndex(i,j)]

wherenumElementsInRow(i)返回第 - 行中非零的数量，i并columnIndex(i,j)给出第 - 行j中的第 - 列索引i。

在您上面的实现中，您的columnIndex(i,j)映射由两个数组ptr和完成indices，即 columnIndex(i,j) == indices[ptr[i] + j]元素的数量由给出 numElementsInRow(i) == ptr[i+1] - ptr[i]。不需要索引矩阵，因为您只存储它的压缩版本。

压缩列存储：

现在更改两个循环的顺序并遍历行中的非零：

y = 0
for j = 0 : Nc - 1
    for i = 0 : numElementsInColumn(j) - 1
        y[rowIndex(j,i)] += M[rowIndex(j,i), j] * x[j]

其余的类似于 CRS 格式。

score 1 · Accepted Answer

写出一个密集矩阵向量乘法的算法。如果您不知道如何做到这一点，请查看任何基本的线性代数教科书，或询问您的教授。
您的算法将遍历矩阵的行和列；如有必要，重新安排算法，使内部循环位于列之上。
修改算法以使用密集存储方案来访问矩阵。

score 1 · Accepted Answer

CCS 与 CRS 类似，只是转置了。您没有行指针，但有类似的列指针。因此，您的顺序循环应该是

for(int col=0; col<num_cols; col++){
  for(int j=ptr[col];j<ptr[col+1]; j++) { 
    y[indices[j]] += val[j]*x[col]; 
  }
}

记得在之前将 y 向量归零。

你觉得哪一个更快？CCS还是CRS？

c - 压缩列存储中的稀疏矩阵与列向量的乘法

3 回答 3

压缩行存储：

压缩列存储：

Related

Reference