1

我必须将一个大文件加载到内存中,并且我想找到一个子字符串。哪种方法更快?

// 应用程序初始化

string instring  = "which is faster find in string or list..."; // large string +- 150MB
List<string> inlist = new List<string>();
foreach (string word in instring) {
    inlist.Add(word);
}

// 按钮点击

if (instring.Contains("find")) {
  ...
}

或者

if (inlist.Contains("find")) {
  ...
}

在我的情况下,我做了一些测量字符串搜索是最快的。

Singel search:
Boyer-Moore search found - elapsed: 00:00:00.0025893
String search found - elapsed: 00:00:00.0026120
List search not found - elapsed: 00:00:00.0026394

Multi search:
Boyer-Moore search found - elapsed: 00:00:00.0027377
Boyer-Moore search found - elapsed: 00:00:00.0028308
Boyer-Moore search found - elapsed: 00:00:00.0029269
Boyer-Moore search found - elapsed: 00:00:00.0030234
Boyer-Moore search found - elapsed: 00:00:00.0031210

String search found - elapsed: 00:00:00.0032474
String search found - elapsed: 00:00:00.0032653
String search found - elapsed: 00:00:00.0032832
String search found - elapsed: 00:00:00.0033015
String search found - elapsed: 00:00:00.0033201


List search not found - elapsed: 00:00:00.0033629
List search not found - elapsed: 00:00:00.0033826
List search not found - elapsed: 00:00:00.0033961
List search not found - elapsed: 00:00:00.0034155
List search not found - elapsed: 00:00:00.0034345
4

2 回答 2

4

你正在测试完全不同的东西。

例如,假设您确实在寻找“find”,并且您有一个文件是:

If you're interested in finding the answer, make sure you know the question.

如果你把它分成一个字符串列表,每个单词一个,那么“find”就不会出现——因为它只是“finding”这个词的一部分。但是,使用string.Contains发现它,因为它是一个子字符串。

你应该首先制定出你想要的行为,以最简单、最优雅的方式实现它,然后衡量性能。如果这符合您想要的性能,那么您就完成了。如果没有,您可以尝试改进它,在每个点进行测量并确保您仍然拥有您想要的行为。

于 2012-11-17T09:43:37.147 回答
0

通过缓冲区流式传输文件并按行分析它可能会更好。在这两种情况下,您都必须读取整个文件,但是当您构建列表时,您必须在内存中拥有完整的文件内容

c# 来自microsoft msdn的示例

using System;
using System.IO;

class Test 
{
    public static void Main() 
    {
        try 
        {
            // Create an instance of StreamReader to read from a file.
            // The using statement also closes the StreamReader.
            using (StreamReader sr = new StreamReader("TestFile.txt")) 
            {
                string line;
                string subString = "find this";
                // Read and display lines from the file until the end of 
                // the file is reached.
                while ((line = sr.ReadLine()) != null) 
                {
                    if ( line.Contains(substring) ) 
                    {
                         Console.WriteLine("Found string");
                         break;
                    }
                }
            }
        }
        catch (Exception e) 
        {
            // Let the user know what went wrong.
            Console.WriteLine("The file could not be read:");
            Console.WriteLine(e.Message);
        }
    }
}
于 2012-11-17T09:46:46.193 回答