0

我想从我的文本文件中删除停用词,为此我编写了以下代码

 TextWriter tw = new StreamWriter("D:\\output.txt");
 private void button1_Click(object sender, EventArgs e)
        {
            StreamReader reader = new StreamReader("D:\\input1.txt");
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                string[] parts = line.Split(' ');
                string[] stopWord = new string[] { "is", "are", "am","could","will" };
                foreach (string word in stopWord)
                {
                    line = line.Replace(word, "");
                    tw.Write("+"+line);
                }
                tw.Write("\r\n");
            } 

但它不会在输出文件中显示结果,并且输出文件保持为空。

4

4 回答 4

6

正则表达式可能非常适合这项工作:

        Regex replacer = new Regex("\b(?:is|are|am|could|will)\b");
        using (TextWriter writer = new StreamWriter("C:\\output.txt"))
        {
            using (StreamReader reader = new StreamReader("C:\\input.txt"))
            {
                while (!reader.EndOfStream)
                {
                    string line = reader.ReadLine();
                    replacer.Replace(line, "");
                    writer.WriteLine(line);
                }
            }
            writer.Flush();
        }

此方法只会用空格替换单词,如果停用词是另一个单词的一部分,则不会对它们执行任何操作。

祝你的任务好运。

于 2013-03-14T18:35:04.950 回答
2

以下对我来说按预期工作。但是,这不是一个好方法,因为它会删除停用词,即使它们是更大单词的一部分。此外,它不会清除已删除单词之间的多余空格。

string[] stopWord = new string[] { "is", "are", "am","could","will" };

TextWriter writer = new StreamWriter("C:\\output.txt");
StreamReader reader = new StreamReader("C:\\input.txt");

string line;
while ((line = reader.ReadLine()) != null)
{
    foreach (string word in stopWord)
    {
        line = line.Replace(word, "");
    }
    writer.WriteLine(line);
}
reader.Close();
writer.Close();

此外,我建议using您在创建流时使用语句,以确保及时关闭文件。

于 2013-03-14T18:29:20.740 回答
1

您应该将 IO 对象包装在 using 语句中,以便正确处理它们。

using (TextWriter tw = new TextWrite("D:\\output.txt"))
{
    using (StreamReader reader = new StreamReader("D:\\input1.txt"))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            string[] parts = line.Split(' ');
            string[] stopWord = new string[] { "is", "are", "am","could","will" };
            foreach (string word in stopWord)
            {
                line = line.Replace(word, "");
                tw.Write("+"+line);
            }
        }
    }
}
于 2013-03-14T18:30:28.943 回答
0

尝试 wrapStreamWriterStreamReaderinusing() {}子句。

using (TextWriter tw = new StreamWriter(@"D:\output.txt")
{
  ...
}

您可能还想tw.Flush()在最后打电话。

于 2013-03-14T18:28:41.540 回答