c# - C# 压缩字节数组

Question

我对压缩算法了解不多。我正在寻找一种简单的压缩算法（或代码片段），它可以减小 byte[,,] 或 byte[] 的大小。我无法使用 System.IO.Compression。此外，数据有很多重复。

我尝试实现 RLE 算法（在下面发布以供您检查）。但是，它产生的阵列要大 1.2 到 1.8 倍。

public static class RLE
{
    public static byte[] Encode(byte[] source)
    {
        List<byte> dest = new List<byte>();
        byte runLength;

        for (int i = 0; i < source.Length; i++)
        {
            runLength = 1;
            while (runLength < byte.MaxValue 
                && i + 1 < source.Length 
                && source[i] == source[i + 1])
            {
                runLength++;
                i++;
            }
            dest.Add(runLength);
            dest.Add(source[i]);
        }

        return dest.ToArray();
    }

    public static byte[] Decode(byte[] source)
    {
        List<byte> dest = new List<byte>();
        byte runLength; 

        for (int i = 1; i < source.Length; i+=2)
        {
            runLength = source[i - 1];

            while (runLength > 0)
            {
                dest.Add(source[i]);
                runLength--;
            }
        }
        return dest.ToArray();
    }

}

我还找到了一个基于 Java、字符串和整数的 LZW 实现。我已将其转换为 C#，结果看起来不错（代码贴在下面）。但是，我不确定它是如何工作的，也不知道如何使它使用字节而不是字符串和整数。

public class LZW
{
    /* Compress a string to a list of output symbols. */
    public static int[] compress(string uncompressed)
    {
        // Build the dictionary.
        int dictSize = 256;
        Dictionary<string, int> dictionary = new Dictionary<string, int>();
        for (int i = 0; i < dictSize; i++)
            dictionary.Add("" + (char)i, i);

        string w = "";
        List<int> result = new List<int>();

        for (int i = 0; i < uncompressed.Length; i++)
        {
            char c = uncompressed[i];
            string wc = w + c;
            if (dictionary.ContainsKey(wc))
                w = wc;
            else
            {
                result.Add(dictionary[w]);
                // Add wc to the dictionary.
                dictionary.Add(wc, dictSize++);
                w = "" + c;
            }
        }

        // Output the code for w.
        if (w != "")
            result.Add(dictionary[w]);
        return result.ToArray();
    }

    /* Decompress a list of output ks to a string. */
    public static string decompress(int[] compressed)
    {
        int dictSize = 256;
        Dictionary<int, string> dictionary = new Dictionary<int, string>();
        for (int i = 0; i < dictSize; i++)
            dictionary.Add(i, "" + (char)i);

        string w = "" + (char)compressed[0];
        string result = w;
        for (int i = 1; i < compressed.Length; i++)
        {
            int k = compressed[i];
            string entry = "";
            if (dictionary.ContainsKey(k))
                entry = dictionary[k];
            else if (k == dictSize)
                entry = w + w[0];

            result += entry;

            // Add w+entry[0] to the dictionary.
            dictionary.Add(dictSize++, w + entry[0]);

            w = entry;
        }

        return result;
    }
}

score 1 · Accepted Answer

看看这里。我在我的一个工作项目中使用此代码作为压缩的基础。不确定 Xbox 360 SDK 中有多少 .NET Framework 是可访问的，因此不确定这对您的效果如何。

score 1 · Accepted Answer

该 RLE 算法的问题在于它太简单了。它为每个字节添加重复次数的前缀，但这确实意味着在长范围的非重复字节中，每个单个字节都以“1”为前缀。在没有任何重复的数据上，这将使文件大小加倍。

这可以通过使用代码类型 RLE 来避免；“代码”（也称为“令牌”）将是一个可以具有两种含义的字节；它要么指示单个后续字节重复了多少次，要么指示应按原样复制后续的非重复字节数。这两个代码之间的区别是通过启用最高位来实现的，这意味着该值仍有 7 位可用，这意味着每个此类代码要复制或重复的数量最多为 127。

这意味着即使在最坏的情况下，最终大小也只能比原始文件大小大 1/127 左右。

可以在此处找到对整个概念以及完整工作（实际上是经过高度优化）的 C# 代码的一个很好的解释：

http://www.shikadi.net/moddingwiki/RLE_Compression

请注意，有时数据最终会比原始数据大，这仅仅是因为其中没有足够的重复字节使 RLE 工作。处理此类压缩失败的一个好方法是在最终数据中添加标头。如果您只是在开头添加一个额外的字节，0 表示未压缩数据，1 表示 RLE 压缩数据，那么，当 RLE 无法给出较小的结果时，您只需将其保存为未压缩，前面为 0，然后是您的最终数据将比原始文件大一个字节。然后另一端的系统可以读取该起始字节并使用它来确定是否应解压缩以下数据或仅复制以下数据。

score 0 · Accepted Answer

Look into Huffman codes, it's a pretty simple algorithm. Basically, use fewer bits for patterns that show up more often, and keep a table of how it's encoded. And you have to account in your codewords that there are no separators to help you decode.

c# - C# 压缩字节数组

3 回答 3

Related

Reference