.net - 防止 GZipStream/DeflateStream 尝试消耗超过压缩数据

Question

我有一个文件可以创建如下：

stream.Write(headerBytes, 0, headerBytes.Count);

using (var gz = new GZipStream(stream, Compress, leaveOpen: true);
{
    gz.Write(otherBytes, 0, otherBytes.Count);
}

stream.Write(moreBytes, 0, moreBytes.Count);

现在当读取文件时

stream.Read(headerBytes, 0, headerBytes.Count);
// in reality I make sure that indeed headerBytes.Count get read,
// something the above line omits

using (var gz = new GZipStream(stream, Decompress, leaveOpen: true)
{
  do { /* use buffer... */}
  while ((bytesRead = gz.Read(buffer, 0, buffer.Length)) != 0);
}

while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) != 0)
  // use buffer...

事实证明GZipStream（对于也是如此DeflateStream）从中读取 16384 个字节stream，而不是在我检查的情况下实际的 13293 个压缩字节。

假设我事先既不知道文件压缩部分的大小，也不知道压缩数据后面的字节数，有没有办法使用 GzipStream/DeflateStream

所以它只从stream
或者至少弄清楚压缩数据部分的大小是多少，所以我可以stream.Position -= actuallyRead - compressedSize手动？

score 1 · Accepted Answer

该接口似乎没有提供一种方法来做你想做的事，这是不使用 .NET 的 GZipStream 或 DeflateStream的众多原因之一。

您应该改用DotNetZip。

score 0 · Accepted Answer

按照 Mark Adler 的建议，我尝试了 DotNetZip，你瞧，它的GZipStream.Position属性不仅不会抛出，它甚至会返回读入的实际 gzip 字节数（加上 8，出于某种我仍然需要弄清楚的原因）。

所以它确实比严格必要的要多，但它让我计算回溯多少。

以下对我有用：

var posBefore = fileStream.Position;
long compressedBytesRead;
using (var gz = new GZipStream(fileStream, CompressionMode.Decompress, true))
{
    while (gz.Read(buffer, 0, buffer.Length) != 0)
        ; // use it!
    compressedBytesRead = gz.Position;
}
var gzipStreamAdvance = fileStream.Position - posBefore;
var seekBack = gzipStreamAdvance - compressedBytesRead - 8; // but why "- 8"?
fileStream.Position -= seekBack;

score 0 · Accepted Answer

这个答案相当于一个丑陋的解决方法。我不是特别喜欢它，但它确实有效（除非它不起作用），即使仅适用于GZipStream.

或者至少弄清楚压缩数据部分的大小是多少，所以我可以stream.Position -= actuallyRead - compressedSize 手动？

因为每个 gzip 文件（实际上是每个gzip 成员）都以

     +---+---+---+---+---+---+---+---+
     |     CRC32     |     ISIZE     |
     +---+---+---+---+---+---+---+---+

     CRC32
        This contains a Cyclic Redundancy Check value of the
        uncompressed data

     ISIZE
        This contains the size of the original (uncompressed) input
        data modulo 2^32.

我可以只使用未压缩的大小（模块 2^32），我在关闭后知道GzipStream，然后在流中向后搜索，直到找到与它匹配的 4 个字节。

为了使它更健壮，我还应该在解压缩时计算 CRC32，并在流中向后搜索到形成正确 CRC32 和 ISIZE 的 8 个字节之后。

丑陋，但我确实警告过你。

<讽刺>我多么喜欢封装。封装所有有用的东西，给我们留下一个解压缩流，它可以在无所不知的 API 设计者预见的一个用例中工作。</sarcasm>

SeekBack这是一个到目前为止有效的快速实现：

/// <returns>the number of bytes sought back (including bytes.Length)
///          or 0 in case of failure</returns>
static int SeekBack(Stream s, byte[] bytes, int maxSeekBack)
{
    if (maxSeekBack != -1 && maxSeekBack < bytes.Length)
        throw new ArgumentException("maxSeekBack must be >= bytes.Length");

    int soughtBack = 0;
    for (int i = bytes.Length - 1; i >= 0; i--)
    {
        while ((maxSeekBack == -1 || soughtBack < maxSeekBack)
               && s.Position > i)
        {
            s.Position -= 1;
            // as we are seeking back, the following will never become
            // -1 (EOS), so coercing to byte is OK
            byte b = (byte)s.ReadByte();
            s.Position -= 1;
            soughtBack++;
            if (b == bytes[i])
            {
                if (i == 0)
                    return soughtBack;
                break;
            }
            else
            {
                var bytesIn = (bytes.Length - 1) - i;
                if (bytesIn > 0) // back to square one
                {
                    soughtBack -= bytesIn;
                    s.Position += bytesIn;
                    i = bytes.Length - 1;
                }
            }
        }
    }
    // no luck? return to original position
    s.Position += soughtBack;
    return 0;
}

.net - 防止 GZipStream/DeflateStream 尝试消耗超过压缩数据

3 回答 3

Related

Reference