c# - 逐行读取大文本文件并搜索字符串

Question

我目前正在开发一个读取大约 50000 行文本文件的应用程序。对于每一行，我需要检查它是否包含特定的字符串。

目前，我使用常规System.IO.StreamReader方式逐行读取我的文件。

问题是文本文件的大小每次都会改变。我做了几个测试性能，我注意到当文件大小增加时，读取一行需要更多的时间。

例如：

读取包含 5000 行的 txt 文件：0:40
读取包含 10000 行的 txt 文件：2:54

读取 2 倍大的文件需要 4 倍的时间。我无法想象读取 100000 行文件需要多少时间。

这是我的代码：

using (StreamReader streamReader = new StreamReader(this.MyPath))
{
     while (streamReader.Peek() > 0)
     {
          string line = streamReader.ReadLine();

          if (line.Contains(Resources.Constants.SpecificString)
          {
               // Do some action with the string.
          }
     }
}

有没有办法避免这种情况：更大的文件 = 更多的时间来阅读一行？

score 7 · Accepted Answer

尝试这个：

var toSearch = Resources.Constants.SpecificString;
foreach (var str in File.ReadLines(MyPath).Where(s => s.Contains(toSearch))) {
    // Do some action with the string
}

这通过在循环之前缓存值来避免在每次迭代中访问资源。如果这没有帮助，请尝试Contains根据高级字符串搜索算法（例如KMP ）编写自己的算法。

注意：请务必使用File.ReadLines来懒惰地读取行（不像类似的方式File.ReadAllLines一次读取所有行）。

score 0 · Accepted Answer

使用RegEx.IsMatch，您应该会看到一些性能改进。

using (StreamReader streamReader = new StreamReader(this.MyPath))
{
 var regEx = new Regex(MyPattern, RegexOptions.Compiled);

 while (streamReader.Peek() > 0)
 {
      string line = streamReader.ReadLine();

      if (regEx.IsMatch(line))
      {
           // Do some action with the string.
      }
 }
}

但是，请记住使用已编译的 RegEx。这是一篇相当不错的文章，其中包含一些您可以查看的基准。

快乐编码！

c# - 逐行读取大文本文件并搜索字符串

例如 ：

2 回答 2

Related

Reference

例如：