algorithm - 最小化稀疏矩阵空间复杂度的算法？

Question

在计算机上存储和操作稀疏矩阵（零和一）时，使用利用矩阵的稀疏结构的专门算法和数据结构是有益的，并且通常是必要的。当应用于大型稀疏矩阵时，使用标准矩阵结构和算法的运算速度很慢并且会消耗大量内存。稀疏数据本质上很容易压缩，并且这种压缩几乎总是会显着减少内存使用量。

您将获得一个二维矩阵，其中行数是预先知道的（您可以选择 30-256 之间的任何数字）。列数非常非常大。你可以想到 10 ⁶列。每列恰好有 1 个值为 1。

编写一个算法来最小化这个矩阵的空间复杂度。你可以展示你的算法是如何工作的，甚至可以编写一个程序。

score 2 · Accepted Answer

提示（因为这是家庭作业）：每一列都包含一个带有“1”的字段，因此对于每一列来说，存储这种情况的行就足够了。现在考虑存储此信息的好方法，考虑到行数 <= 256。

score 0 · Accepted Answer

我写了一个简单的算法来最小化表示稀疏矩阵所需的空间，我想你可能会发现它很有用:)。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace sparse_matrix
{
    class Program
    {
        static void Main(string[] args)
        {
            /*
             * In this algorithm I store all the Sparse Matrix into single diminsion matrix of type byte, each row
             * in this matrix has a number that represent the position of the "1" in the original Sparse matrix.
             * By this way, I save a lot of space... and store the same matrix with the minimum space(I could reach)
             * Since I treat the digits as bits, and stored them into the byte, since the biggest number I will reach
             * will not exceed 255 bit.
             * I read about the byte data type from here:(http://msdn.microsoft.com/en-us/library/5bdb6693.aspx)
             * In this program, I run it on a (3 * 3) matrix, however it can work the same for (255 * 10^6)
             */
            int matrixRows = 3;
            int matrixColumns = 3;
            int[,] matrix;
            matrix = new int[matrixRows, matrixColumns];
            matrix[0, 0] = 0;
            matrix[1, 0] = 0;
            matrix[2, 0] = 1;
            matrix[0, 1] = 1;
            matrix[1, 1] = 0;
            matrix[2, 1] = 0;
            matrix[0, 2] = 0;
            matrix[1, 2] = 1;
            matrix[2, 2] = 0;

            byte []x = new byte [matrixColumns];
            Console.WriteLine("The Original Sparse Matrix:\n");

            for (int i = 0; i != matrixColumns; i++)
            {
                for (int j = 0; j != matrixRows; j++)
                {
                    if (matrix[j, i] == 1)
                    {                        
                        x[i] = (byte)j;
                    }
                    Console.Write(matrix[i, j] + " ");
                }
                Console.WriteLine();
                Console.WriteLine();
            }
            Console.WriteLine();
            Console.WriteLine();
            Console.WriteLine("The new representation for the Sparse matrix:\n");
            for (int k = 0; k != matrixColumns; k++)
            {
                Console.WriteLine(x[k] + "\n");
            }
        }
    }
}

我很乐意收到您的任何反馈。

score 0 · Accepted Answer

稀疏矩阵在 CS 中以两种不同的方式存储。相邻矩阵或相邻列表：我认为您想使用矩阵，所以我们有几种选择：CSR、锯齿状对角线、CSC 等：推荐维基百科他们来学习算法。它本质上将矩阵分解为 3 个不同的一维数组。

score 0 · Accepted Answer

您正在处理一个严重受限的问题，远远超过您在（例如）有限元代码中看到的“一般”稀疏矩阵。特别是，您可以利用大多数人无法利用的这些项目：

行数是恒定的，您可以选择它。
出现在矩阵任何单元格中的数字不是 1 就是 0。
每列包含 1 个且只有 1 个非零值。

让我在这里暂停一下，说这个矩阵在一个非常常见的情况下以接近这种形式出现：置换矩阵，但在这些情况下，它总是一个方阵（行数与列数相同）。

这是一个非常基本的压缩方案，很容易实现，并且很容易记下存储大小为N x M（N行，M列）的矩阵所需的确切字节数。令N = 256，这意味着我们可以将一行表示为 0-255 之间的数字，这正是一个无符号的 8 位数字。将矩阵存储为 1x M无符号 8 位（1 字节）整数数组，其中索引i处的值是包含i列中值 1 的行号。没有理由存储值本身（因为它始终为 1），也没有必要显式存储列号，因为我们将其表示为数组索引。

假设一个“密集”NxM 矩阵存储 4 字节整数。这样的矩阵将需要 N*M*4 字节的内存。使用我描述的压缩，它只需要 M 字节的内存，正如所描述的是原始内存的 1/1024！

algorithm - 最小化稀疏矩阵空间复杂度的算法？

4 回答 4

Related

Reference