I have an inputStream
that I want to use to compute a hash and save the file to disk. I would like to know how to do that efficiently. Should I use some task to do that concurrently, should I duplicate the stream pass to two streams, one for the the saveFile
method and one for thecomputeHash
method, or should I do something else?
5 回答
What about using a hash algorithms that operate on a block level? You can add the block to the hash (using the TransformBlock) and subsequently write the block to the file foreach block in the stream.
Untested rough shot:
using System.IO;
using System.Security.Cryptography;
...
public byte[] HashedFileWrite(string filename, Stream input)
{
var hash_algorithm = MD5.Create();
using(var file = File.OpenWrite(filename))
{
byte[] buffer = new byte[4096];
int read = 0;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
hash_algorithm.TransformBlock(buffer, 0, read, null, 0);
file.Write(buffer, 0, read);
}
hash_algorithm.TransformFinalBlock(buffer, 0, read);
}
return hash_algorithm.Hash;
}
此方法将使用链式流复制和散列。
private static byte[] CopyAndHash(string source, string target)
{
using (var sha512 = SHA512.Create())
{
using (var targetStream = File.OpenWrite(target))
using (var cryptoStream = new CryptoStream(targetStream, sha512, CryptoStreamMode.Write))
using (var sourceStream = File.OpenRead(source))
{
sourceStream.CopyTo(targetStream);
}
return sha512.Hash;
}
}
有关完整示例,包括取消和进度报告,请参阅https://gist.github.com/dhcgn/da1637277d9456db9523a96a0a34da78
这可能不是最好的选择,但我会选择Stream
后代/包装器,它可以直接将文件写入磁盘。
所以:
- 从获得
Stream
- 有一个这样的成员
Stream _inner;
将是要写入的目标流 - 实施
Write()
和所有相关的东西 - 在
Write()
散列数据块并调用_inner.Write()
使用示例
Stream s = File.Open("infile.dat");
Stream out = File.Create("outfile.dat");
HashWrapStream hasher = new HashWrapStream(out);
byte[] buffer=new byte[1024];
int read = 0;
while ((read=s.Read(buffer)!=0)
{
hasher.Write(buffer);
}
long hash=hasher.GetComputedHash(); // get actual hash
hasher.Dispose();
s.Dispose();
这是我的解决方案,它将结构数组(ticks 变量)写入 csv 文件(使用 CsvHelper nuget 包),然后使用后缀 .sha256 创建用于校验和的哈希
我通过将 csv 写入 memoryStream,然后将内存流写入磁盘,然后将 memorystream 传递给哈希算法来做到这一点。
该解决方案将整个文件保留为内存流。除了会使您耗尽内存的数 GB 文件之外,一切都很好。如果我不得不再次这样做,我可能会尝试使用 CryptoStream 方法,但这对于我可预见的目的来说已经足够了。
我已经通过第 3 方工具验证了哈希是有效的。
这是代码:
//var ticks = **some_array_you_want_to_write_as_csv**
using (var memoryStream = new System.IO.MemoryStream())
{
using (var textWriter = new System.IO.StreamWriter(memoryStream))
{
using (var csv = new CsvHelper.CsvWriter(textWriter))
{
csv.Configuration.DetectColumnCountChanges = true; //error checking
csv.Configuration.RegisterClassMap<TickDataClassMap>();
csv.WriteRecords(ticks);
textWriter.Flush();
//write to disk
using (var fileStream = new System.IO.FileStream(targetFileName, System.IO.FileMode.Create))
{
memoryStream.Position = 0;
memoryStream.CopyTo(fileStream);
}
//write sha256 hash, ensuring that the file was properly written
using (var sha256 = System.Security.Cryptography.SHA256.Create())
{
memoryStream.Position = 0;
var hash = sha256.ComputeHash(memoryStream);
using (var reader = System.IO.File.OpenRead(targetFileName))
{
System.IO.File.WriteAllText(targetFileName + ".sha256", hash.ConvertByteArrayToHexString());
}
}
}
}
}
您需要将流的字节填充到 abyte[]
中以便对它们进行哈希处理。