c# - 搜索文本文件中的字符串及其上一句和下一句

Question

如果我有一个搜索条件：She likes to watch tv

text.txt包含一些句子的输入文件，例如：

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

我想在文本文件中搜索字符串，并返回包含字符串的句子，加上它之前和之后的句子。

输出应如下所示：

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

因此，它输出匹配搜索词之前的句子、包含搜索词的句子和搜索词之后的句子。

score 3 · Accepted Answer

这里介绍了一种方法：

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

它首先将整个文本拆分为 period .，将它们存储在一个数组中，然后在数组中搜索输入字符串后取出当前、上一个和下一个短语。

score 3 · Accepted Answer

像这样的东西怎么样：

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

我假设您要查找的短语由“。”分隔。此代码的工作原理是查找短语的索引并在匹配的后面查找前一个短语，并在短语前面查找后面的句子。

score 2 · Accepted Answer

使用String.IndexOf()( docs ) 将返回文件中第一次出现的字符串。使用此值，您可以删除包含的短语或句子：

int index = paragraph.IndexOf("She likes to watch tv")

然后您将使用设置边界和拆分（可能在正则表达式index中使用大写字母和句号），以拉出两边的句子。

score 2 · Accepted Answer

您可以使用Regex抓取文本：

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

参考： http: //msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace (v=vs.90).aspx

c# - 搜索文本文件中的字符串及其上一句和下一句

4 回答 4

Related

Reference