0

我目前正在做一个小型 C# 练习,处理在文本文件中搜索相关术语/单词,程序将写出文本文件中包含搜索单词的所有句子。例如,我输入单词:“example”,程序将遍历文本文件中的所有句子,并提取其中包含“example”一词的句子。

The text file is structured as so: <sentenceDesignator> <text>
sentence 1: bla bla bla bla example of a sentence  //each line contains a sentence
sentence 2: this is not a good example of grammar
sentence 3: bla is not a real word, use better terms

我想做的是使用线性搜索来遍历文本文件中的所有行并写出包含搜索字符串术语的所有句子。

到目前为止我的代码:

        String filename = @"sentences.txt";

        if (!File.Exists(filename))
        {
            // Since we just created the file, this shouldn't happen.
            Console.WriteLine("{0} not found", filename);
            return;
        }
        else
        {
            Console.WriteLine("Successfully found {0}.", filename);
        }
        //making a listof type "Sentence" to hold all the sentences
        List<Sentence> sentences = new List<Sentence>();

        //the next lines of code...
        StreamReader reader = File.OpenText(filename);

        //first, write out all of the sentences in the text file

        //read a line(sentence) from a line in the text file
        string line = reader.ReadLine();

        while (line != null)
        {
            Sentence s = new Sentence();

            //we need something to split data...
            string[] lineArray = line.Split(':');

            s.sentenceDesignator = lineArray[0];
            s.Text = lineArray[1];

            Console.Write("\n{0}", line);

            line = reader.ReadLine();
        }

        //so far, we can write out all of the sentences in the text file. 
        Console.Write("\n\nOK!, search a term to diplay all their occurences: ");
        string searchTerm = Console.ReadLine();

       if(!line.Contains(searchterm))
       {
          Console.Write("\nThat term does not exist in any sentence.");
       }
       else
        {
            foreach (Sentence ss in sentences)
            {
                if (ss.sentenceDesignator.Contains(queryName))
                {
                    //I need help here
                }
            }
        }
4

2 回答 2

1

如果您构建文件的索引然后搜索索引会快得多,就像线性搜索每个搜索操作一样,O(n)而索引搜索它O(n)用于构建索引,但O(log n)near-O(1)用于查找(取决于您如何构建你的索引)。成本是增加了索引的内存消耗,但我会这样做:

private Dictionary<String,List<Int32>> _index = new Dictionary<String,List<Int32>>();

/// <summary>Populates an index of words in a text file. Takes O(n) where n is the size of the input text file.</summary>
public void BuildIndex(String fileName) {

    using(Stream inputTextFile = OpenFile(...)) {

        int currentPosition = 0;
        foreach(String word in GetWords(inputTextFile)) {

            word = word.ToUpperInvariant();
            if( !_index.ContainsKey( word ) ) _index.Add( word, new List<Int32>() );
            _index[word].Add( currentPosition );

            currentPosition = inputTextFile.Position;
        }
    }
}

/// <summary>Searches the text file (via its index) if the specified string (in its entirety) exists in the document. If so, it returns the position in the document where the string starts. Otherwise it returns -1. Lookup time is O(1) on the size of the input text file, and O(n) for the length of the query string.</summary>
public Int32 SearchIndex(String query) {

    String[] terms = query.Split(' ');

    Int32 startingPosition = -1;
    Int32 currentPosition = -1;
    Boolean first = true;
    foreach(String term in terms) {
        term = term.ToUpperInvariant();

        if( first ) {
            if( !_index.Contains( term ) ) return -1;
            startingPosition = _index[term][0];
        } else {

            if( !ContainsTerm( term, ++currentPosition ) ) return -1;
        }

        first = false;
    }

    return startingPosition;
}

/// <summary>Indicates if the specified term exists at the specified position.</summary>
private Boolean ContainsTerm(String term, Int32 expectedPosition) {

    if( !_index.ContainsKey(term) ) return false;
    List<Int32> positions = _index[term];
    foreach(Int32 pos in positions) {

        if( pos == expectedPosition ) return true;
    }
    return false;
}

和应该是微不足道OpenFile的。GetWords请注意,GetWords用于在文件yield return中构建一个IEnumerable<String>以空格分隔的单词,以及处理您的自定义文件格式。

于 2013-02-25T07:05:30.570 回答
0

我对最后一个 if/else 有点困惑。似乎您只是将文件的最后一行与搜索词进行比较。另外,“queryName”从何而来?你想打印整个句子(“bla bla bla bla example of a sentence”)还是只打印“sentence 1”?另外,您检查 sentenceDesignator 是否包含查询名称,我想您想检查实际的 Text 是否包含搜索词。

也许这会帮助你:

var lines = File.ReadAllLines(fileName);    
var sentences = new List<Sentence>(lines.Count());

foreach (var line in lines)
{
    var lineArray = line.Split(':');
    sentences.Add(new Sentence { sentenceDesignator = lineArray[0], Text = lineArray[1]});
}

foreach (var sentence in sentences)
{
    if (sentence.Text.Contains(searchTerm))
    {
        Console.WriteLine(sentence.sentenceDesignator);
        //Console.WriteLine(sentence.Text);
    }
}
于 2013-02-25T09:25:04.080 回答