c# - 一种更快的解压文本文件的方法，它使用一种独特的压缩形式

Question

我不知道这种类型的压缩是否在其他地方使用，但这就是它的工作原理。它使用 4 个字符。第一个字符“ú”表示紧随其后。接下来的 2 个字符以十六进制表示，第 4 个位置有多少要重复。例如：

22ú05hú0C0AFC001

将会：

22hhhhh000000000000AFC001

我能够做到这一点，但它运行非常缓慢。一个 20k 的文件可能需要 5 分钟或更长时间。

这是我的代码：

public string doDecompression(string Content)
{
    string pattern = @"ú...";
    Regex rgx = new Regex(pattern);

    foreach (Match match in rgx.Matches(Content))
    {
        // Gets the raw Hex code
        string hex = match.ToString().Substring(1, 2);

        // Converts Hex code to an Integer 
        int convertedHex = Int32.Parse(hex, NumberStyles.HexNumber);

        // Gets the character to repeat
        string character = match.ToString().Substring(3, 1);

        // Converts the character to repeat into
        // a "char" so I can use it in the line below
        char repeatingChar = character[0];

        // Creates a string out of the repeating characters 
        string result = new String(repeatingChar, convertedHex);

        // This does the actual replacing of the text
        Content = Content.Replace(match.ToString(), result); 
    }

    return Content;
}

有没有更好的办法？

score 7 · Accepted Answer

您在这里看到的是RLE 算法的变体。

你真的不需要正则表达式来完成这项工作，更不用说使用不可变字符串的昂贵操作了。

尝试以下方法：

public static IEnumerable<char> Decompress(string compressed)
{
    for(var i = 0; i < compressed.Length; )
    {
        var c = compressed[i++];
        if(c == 'ú')
        {
            var count = int.Parse(compressed.Substring(i, 2), NumberStyles.HexNumber);
            i += 2;

            c = compressed[i++];

            foreach(var character in Enumerable.Repeat(c, count))
                yield return character;
        }
        else
        {
            yield return c;
        }
    }
}

c# - 一种更快的解压文本文件的方法，它使用一种独特的压缩形式

1 回答 1

Related

Reference