c# - 在 C# 中读取包含大量行的文本文件

Question

我有一个文本文件，其中可能包含多达 100 万行，并且我有一个用于一次读取一行文件的代码，但这需要很多时间......很多很多时间。c# 中是否有一种方法可以优化此过程并改善阅读效果。这是我正在使用的代码。

using(var file = new StreamReader(filePath))
{
    while((line = file.ReadLine()) != null)
     {
         //do something.
     }
}

关于批量阅读这些行或改进流程有什么建议吗？

谢谢。

谢谢你们的评论。该问题与我使用 SmartXls 库写入 Excel 的 \do 操作有关，这导致了瓶颈。我已联系开发人员解决此问题。所有建议的解决方案都适用于其他场景。

score 6 · Accepted Answer

Well, this code would be simpler, if you're using .NET 4 or later you can use File.ReadLines:

foreach (var line in File.ReadLines())
{
    // Do something
}

Note that this is not the same as ReadAllLines, as ReadLines returns an IEnumerable<string> which reads lines lazily, instead of reading the whole file in one go.

The effect at execution time will be broadly the same as your original code (it won't improve performance) - this is just simpler to read.

Fundamentally, if you're reading a large file, that can take a long time - but reading just a million lines shouldn't take "lots and lots of time". My guess is that whatever you're doing with the lines takes a long time. You might want to parallelize that, potentially using a producer/consumer queue (e.g. via BlockingCollection) or TPL Dataflow, or just use Parallel LINQ, Parallel.ForEach etc.

You should use a profiler to work out where the time is being spent. If you're reading from a very slow file system, then it's possible that it really is the reading which is taking the time. We don't have enough information to guide you on that, but you should be able to narrow it down yourself.

score 0 · Accepted Answer

如果空间不是问题..创建一个大约 1mb 的缓冲区..

using(BufferedStream bs=new BufferedStream(File.OpenRead(path),1024*1024))
{
     int read=-1;
     byte[] buffer=new byte[1024*1024];
     while((read=bs.Read(buffer,0,buffer.Length))!=0)
     {
            //play with buffer
     }
}

score 0 · Accepted Answer

为了提高性能，请考虑通过生成另一个线程来处理负载来执行您当前在循环中所做的任何工作。

Parallel.ForEach(file.ReadLines(), (line) =>
{
   // do your business
});

score 0 · Accepted Answer

尝试使用streamreader，看看它是否更快

string filePath = "";
string fileData = "";
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
                byte[] data = new byte[fs.Length];
                fs.Seek(0, SeekOrigin.Begin);
                fs.Read(data, 0, int.Parse(fs.Length.ToString()));
                fileData = System.Text.Encoding.Unicode.GetString(data);
}

score 0 · Accepted Answer

您可以使用 StreamReader 一次读取更多数据，int ReadBlock(char[] buffer, int index, int count)而不是逐行读取。这避免了一次读取整个文件（File.ReadAllLines），但允许您一次处理 RAM 中的较大块。

score -2 · Accepted Answer

您还可以使用ReadAllLines(filepath)文件并将其加载到行数组中，如下所示： string[] lines = System.IO.File.ReadAllLines(@"path");

c# - 在 C# 中读取包含大量行的文本文件

6 回答 6

Related

Reference