c# - C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

Question

我在使用 SharpZipLib 的 GZipInputStream 编写未压缩的 GZIP 流时遇到问题。我似乎只能获得 256 字节的数据，其余的没有被写入并保持为零。已检查压缩流 (compressedSection) 并且所有数据都在那里（1500+ 字节）。解压过程片段如下：

int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
    msi.Write(compressedSection, 0, compressedSection.Length);
    msi.Position = 0;
    int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer

    // SharpZipLib GZip method called
    using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
    {
        using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
        {
            byte[] buffer = new byte[uncompressedIntSize];
            decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read         
            outputStream.Write(buffer, 0, uncompressedIntSize);
            using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
            {
                fs.Write(buffer, 0, buffer.Length);
                fs.Close();
            }
            outputStream.Close();
        }
        decompressStream.Close();

所以在这个片段中：

1）压缩段传入，准备解压。

2) 未压缩输出的预期大小（以 2 字节 little-endian 值存储在文件的标头中）通过一种方法将其转换为整数。由于它不是压缩的 GZIP 文件的一部分，因此该标头已被较早地删除。

3) SharpLibZip 的 GZIP 流是用压缩文件流 (msi) 和一个等于 int uncompressedIntSize 的缓冲区声明的（也用静态值 4096 进行了测试）。

4）我设置了一个 MemoryStream 来处理将输出写入文件，因为 GZipInputStream 没有读/写；它将预期的解压缩文件大小作为参数（容量）。

5）流的读/写需要byte[]数组作为第一个参数，所以我设置了一个byte[]数组，有足够的空间来获取解压输出的所有字节（本例中为3584字节，派生自uncompressedIntSize ）。

6) int GzipInputStream decompressStream 使用 .Read 缓冲区作为第一个参数，从偏移量 0 开始，使用 uncompressedIntSize 作为计数。检查这里的参数，缓冲区数组仍然有 3584 字节的容量，但只获得了 256 字节的数据。其余为零。

看起来 .Read 的输出被限制为 256 字节，但我不确定在哪里。Streams 是否有我遗漏的东西，或者这是 .Read 的限制？

score 2 · Accepted Answer

从流中读取时需要循环；懒惰的方式可能是：

decompressStream.CopyTo(outputStream);

（但这并不能保证在uncompressedIntSize字节后停止 - 它会尝试读取到末尾decompressStream）

更手动的版本（尊重强加的长度限制）将是：

const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
    int remaining = uncompressedIntSize, bytesRead;
    while (remaining > 0 && // more to do, and making progress
        (bytesRead = decompressStream.Read(
        buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
    {
        outputStream.Write(buffer, 0, bytesRead);
        remaining -= bytesRead;
    }
    if (remaining != 0) throw new EndOfStreamException();
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}

score 0 · Accepted Answer

这个问题原来是我之前在发布的代码中所做的疏忽：

我正在使用的文件有 27 个 GZipped 部分，但它们每个都有一个标题，如果 GZipInput 流命中其中任何一个，它们都会破坏 Gzip 解压缩。打开基本文件时，每次都是从头开始（调整 6 以避免第一个标题），而不是转到下一个 post-head 偏移量：

brg.BaseStream.Seek(6, SeekOrigin.Begin);

代替：

brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);

这意味着提取的压缩数据是第一个无标题部分+第二部分的一部分及其标题的混合物。由于第一部分的长度为 256 字节，没有标头，因此 GZipInput 流正确地解压缩了这部分。但在那之后是 6 字节的标头将其破坏，导致其余输出为 00。

发生这种情况时，GZipInput 流没有抛出明确的错误，所以我错误地认为原因是 .Read 或流中保留了上一次传递的数据的东西。很抱歉给您带来麻烦。

c# - C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

2 回答 2

Related

Reference