既然你说平台和语言无关紧要......
If you want a stable performance that is as fast as the source medium allows for, the only way I am aware that this can be done on Windows is by overlapped non-OS-buffered aligned sequential reads. You can probably get to some GB/s with two or three buffers, beyond that, at some point you need a ring buffer (one writer, 1+ readers) to avoid any copying. The exact implementation depends on the driver/APIs. If there's any memory copying going on the thread (both in kernel and usermode) dealing with the IO, obviously the larger buffer is to copy, the more time is wasted on that rather than doing the IO. So the optimal buffer size depends on the firmware and driver. On windows good values to try are multiples of 32 KB for disk IO. Windows file buffering, memory mapping and all that stuff adds overhead. Only good if doing either (or both) multiple reads of same data in random access manner. So for reading a large file sequentially a single time, you don't want the OS to buffer anything or do any memcpy's. If using C# there's also penalties for calling into the OS due to marshaling, so the interop code may need bit of optimization unless you use C++/CLI.
有些人更喜欢用硬件来解决问题,但如果你有更多的时间而不是金钱,那么在某些情况下,可以优化一些东西,使其在单个消费级计算机上的性能比 1000 台企业级计算机好 100-1000 倍。原因是,如果处理也对延迟敏感,那么超出使用两个内核可能会增加延迟。这就是为什么驱动程序可以推动千兆字节/秒,而企业软件在全部完成时会卡在兆字节/秒。无论报告、业务逻辑和此类企业软件做什么,如果像您在 80 年代编写游戏时那样编写,也可以在两个核心消费者 CPU 上以千兆字节/秒的速度完成。我听说过以这种方式处理整个业务逻辑的最著名的例子是 LMAX 外汇交易,
忘记所有理论,如果您对 < 1 GB/s 感到满意,我发现在 Windows 上的一个可能起点是查看来自 winimage 的 readfile 源,除非您想深入研究 sdk/驱动程序示例。它可能需要一些源代码修复才能以 SSD 速度正确计算性能。还可以尝试缓冲区大小。根据我的经验,使用无 Windows 文件缓冲的开关 /h 多线程和 /o 重叠(完成端口)IO 具有最佳缓冲区大小(尝试 32,64,128 KB 等)在同时处理时从 SSD(冷数据)读取时提供最佳性能(使用 /a 进行 Adler 处理,否则它太受 CPU 限制)。