2

我必须从文本文件中删除常用词,例如 (is,are,am,was 等)。在 java 中这样做的有效方法是什么?

4

1 回答 1

4

您必须读入文件,跳过要删除的单词,然后再次将文件写回。

因此,您可能更愿意在每次阅读时跳过要忽略的单词 - 取决于您的用例。

要实际逐行删除单词(这可能不是您想要的方式),您可以这样做(使用google guava):

    // the words you want to remove from the file:
    //
    Set<String> wordsToRemove = ImmutableSet.of("a", "for");

    // this code will run in a loop reading one line after another from the file
    //
    String line = "Some words read from a file for example";
    StringBuffer outputLine = new StringBuffer();
    for (String word : Splitter.on(Pattern.compile("\\s+")).trimResults().omitEmptyStrings().split(line)) {
        if (!wordsToRemove.contains(word)) {
            if (outputLine.length() > 0) {
                outputLine.append(' ');
            }
            outputLine.append(word);
        }
    }

    // here I'm just printing, but this line could now be written to the output file.
    //
    System.out.println(outputLine.toString());

运行此代码将输出:

Some words read from file example

即,“a”和“for”被省略。

Notice that this makes for simple code, but, it will change the whitespace formatting in your file. If you had a line with doubled up spaces, tabs etc, then this all gets changed to a single space in this code. This is just an example of how you might do it, depending on your requirements, there will probably be better ways.

于 2012-04-20T10:18:11.297 回答