c# - 使用 Stream.Read() 与 BinaryReader.Read() 处理二进制流

Question

在使用二进制流（即数组）时，使用或似乎简化了从流中读取/写入原始数据类型的byte[]要点，使用诸如编码之类的方法并考虑到编码。这就是故事的全部吗？如果一个人直接使用 a而不使用，是否存在固有的优势或劣势？大多数方法，例如，在两个类中似乎都是相同的，我猜它们在下面的工作方式相同。BinaryReaderBinaryWriterReadBoolean()StreamBinaryReader/BinaryWriterRead()

考虑一个以两种不同方式处理二进制文件的简单示例（编辑：我意识到这种方式无效并且可以使用缓冲区，这只是一个示例）：

// Using FileStream directly
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    int value = 0;
    while ((value = stream.ReadByte()) != -1)
    {
        Console.WriteLine(value);
    }
}


// Using BinaryReader
using (BinaryReader reader = new BinaryReader(FileStream fs = new FileStream("file.dat", FileMode.Open)))
{
    // Read bytes and interpret them as ints
    byte value = 0;    
    while (reader.BaseStream.Position < reader.BaseStream.Length)
    {
        value = reader.ReadByte();
        Console.WriteLine(Convert.ToInt32(value));
    }
}

输出将是相同的，但内部发生了什么（例如从操作系统的角度）？一般而言，使用哪种实现重要吗？BinaryReader/BinaryWriter如果您不需要他们提供的额外方法，是否有任何使用目的？对于这种特定情况，MSDN 对此表示Stream.ReadByte()：

Stream 上的默认实现创建一个新的单字节数组，然后调用 Read。虽然这在形式上是正确的，但它是低效的。

使用GC.GetTotalMemory()，第一种方法似乎分配的空间是第二种方法的 2 倍，但是如果使用更通用的Stream.Read()方法（例如，使用缓冲区读取块），AFAIK 就不应该出现这种情况。尽管如此，在我看来，这些方法/接口可以很容易地统一......

score 18 · Accepted Answer

不，这两种方法之间没有主要区别。额外的 Reader 增加了一些缓冲，所以你不应该混合它们。但不要指望任何显着的性能差异，这一切都由实际的 I/O 主导。

所以，

byte[]当您（仅）可以移动时使用流。在许多流媒体场景中很常见。
byte当您有任何其他基本类型（包括 simple ）要处理的数据时，请使用 BinaryWriter 和 BinaryReader 。它们的主要目的是将内置框架类型转换为byte[].

score 13 · Accepted Answer

一个很大的区别是如何缓冲 I/O。如果你在这里或那里只写/读几个字节，BinaryWriter/BinaryReader会很好用。但是，如果您必须读取 MB 的数据，那么一次读取一个byte, Int32, 等会有点慢。您可以改为读取更大的块并从那里解析。

例子：

// Using FileStream directly with a buffer
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    byte[] buffer = new byte[1024];
    int count;
    // Read from the IO stream fewer times.
    while((count = stream.Read(buffer, 0, buffer.Length)) > 0)
        for(int i=0; i<count; i++)
           Console.WriteLine(Convert.ToInt32(buffer[i]));
}

现在这有点离题了......但我会把它扔在那里：如果你想变得非常狡猾......并且真的给自己一个性能提升......（尽管它可能被认为是危险的）而不是解析 EACH Int32，您可以使用Buffer.BlockCopy()

另一个例子：

// Using FileStream directly with a buffer and BlockCopy
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    byte[] buffer = new byte[1024];
    int[] intArray = new int[buffer.Length >> 2]; // Each int is 4 bytes
    int count;
    // Read from the IO stream fewer times.
    while((count = stream.Read(buffer, 0, buffer.Length)) > 0)
    {
       // Copy the bytes into the memory space of the Int32 array in one big swoop
       Buffer.BlockCopy(buffer, 0, intArray, count);

       for(int i=0; i<count; i+=4)
          Console.WriteLine(intArray[i]);
    }
}

关于这个例子有几点需要注意：这个每个 Int32 占用 4 个字节而不是一个......所以它会产生不同的结果。您也可以对除 Int32 以外的其他数据类型执行此操作，但许多人会认为编组应该在您的脑海中。（我只是想提出一些值得思考的事情......）

score 0 · Accepted Answer

您的两个代码都在做同样的事情，即。ReadByte()，最终得到一个字节数组，因此任何一种方法的结果都是相同的（来自同一个文件）。
操作系统实现（内部差异）是流缓冲在虚拟内存中，例如。如果您通过流在网络上传输文件，您仍然有剩余的系统内存留给其他（多）任务（ing）。
在字节数组的情况下，整个文件将在传输到磁盘（文件创建）或另一个流之前存储在内存中，因此不建议用于大文件。

这里有一些关于通过网络传输二进制数据的讨论：
何时使用字节数组，何时使用流？

@Jason C & @Jon Skeet 在这里提出了一个很好的观点：
为什么大多数序列化程序使用流而不是字节数组？

我注意到我的 Win 10 机器（4GB RAM）有时会跳过超过 5MB 的文件，如果我在继续工作时通过System.Net.Http.Httpclient GetByteArrayAsync方法 (vs GetStreamAsync) 传输文件，而不等待传输完成。

PS：.Net 4.0字节数组限制为2GB

c# - 使用 Stream.Read() 与 BinaryReader.Read() 处理二进制流

3 回答 3

Related

Reference