c# - 在内存中对文本文件进行内存操作的最佳方法：首先读取为 byte[]？读取为 File.ReadAllText() 然后另存为二进制？

Question

我需要更改内存中的文件，目前我使用文件流和二进制读取器将文件读入内存中的字节 []。

我想知道在内存中更改该文件、将 byte[] 转换为字符串、进行更改并执行 Encoding.GetBytes() 的最佳方法是什么？或首先使用 File.ReadAllText() 将文件作为字符串读取，然后使用 Encoding.GetBytes()？或者任何方法都可以在没有警告的情况下工作？

有什么特别的方法吗？我需要用额外的字符或替换字符串替换文件中的特定文本，几十万个文件。可靠性优于效率。文件是 HTML 之类的文本，而不是二进制文件。

score 2 · Accepted Answer

Read the files using File.ReadAllText(), modify them, then do byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file). Use the encoding with which the files were saved. This will give you an array of byte[]. You can convert the byte[] to a stream like this:

MemoryStream stream = new MemoryStream();
stream.Write(byteData, 0, byteData.Length);

Edit: It looks like one of the Add methods in the API can take a byte array, so you don't have to use a stream.

score 1 · Accepted Answer

根据文件的大小，我将使用File.ReadAllText它们来读取它们并File.WriteAllText写入它们。这使您摆脱了必须调用Close或Dispose读取或写入的责任。

score 1 · Accepted Answer

通过先读入字节，您肯定会让自己变得更难。只需使用 StreamReader。您可能可以使用 ReadLine() 并一次处理一行。这会严重减少应用程序的内存使用量，尤其是在处理这么多文件的情况下。

using (var reader = File.OpenText(originalFile))
using (var writer = File.CreateText(tempFile))
{
    string line;
    while ((line = reader.ReadLine()) != null) 
    {
        var temp = DoMyStuff(line);
        writer.WriteLine(temp);
    }
}

File.Delete(originalFile);
File.Move(tempFile, originalFile);

score 0 · Accepted Answer

您通常不想读取二进制级别的文本文件 - 只需使用File.ReadAllText()并提供文件中使用的正确编码（这有一个重载）。如果文件编码是 UTF8 或 UTF32 ，通常该方法可以自动检测并使用正确的结束编码。同样适用于写回 - 如果它不是 UTF8，请指定您想要的编码。

c# - 在内存中对文本文件进行内存操作的最佳方法：首先读取为 byte[]？读取为 File.ReadAllText() 然后另存为二进制？

4 回答 4

Related

Reference