java - 关于在句子中查找单词的 Java 查询

Question

我正在使用斯坦福的 NLP 解析器 (http://nlp.stanford.edu/software/lex-parser.shtml) 将一段文本拆分为句子，然后查看哪些句子包含给定的单词。

到目前为止，这是我的代码：

import java.io.FileReader;
import java.io.IOException;
import java.util.List;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.process.*;

public class TokenizerDemo {

    public static void main(String[] args) throws IOException {
        DocumentPreprocessor dp = new DocumentPreprocessor(args[0]);
        for (List sentence : dp) {
            for (Object word : sentence) {
                System.out.println(word);
                System.out.println(word.getClass().getName());
                if (word.equals(args[1])) {
                    System.out.println("yes!\n");
                }
            }
        }
    }
}

我使用“java TokenizerDemo testfile.txt wall”从命令行运行代码

testfile.txt 的内容是：

Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall.

所以我希望程序检测第一句中的“墙”（“墙”作为命令行中的第二个参数输入）。但是该程序没有检测到“墙”，因为它从不打印“是！”。程序的输出是：

Humpty
edu.stanford.nlp.ling.Word
Dumpty
edu.stanford.nlp.ling.Word
sat
edu.stanford.nlp.ling.Word
on
edu.stanford.nlp.ling.Word
a
edu.stanford.nlp.ling.Word
wall
edu.stanford.nlp.ling.Word
.
edu.stanford.nlp.ling.Word
Humpty
edu.stanford.nlp.ling.Word
Dumpty
edu.stanford.nlp.ling.Word
had
edu.stanford.nlp.ling.Word
a
edu.stanford.nlp.ling.Word
great
edu.stanford.nlp.ling.Word
fall
edu.stanford.nlp.ling.Word
.
edu.stanford.nlp.ling.Word

来自斯坦福解析器的 DocumentPreprocessor 正确地将文本拆分为两个句子。问题似乎与使用 equals 方法有关。每个单词都有类型“edu.stanford.nlp.ling.Word”。我已经尝试访问该单词的底层字符串，因此我可以检查该字符串是否等于“wall”，但我不知道如何访问它。

如果我将第二个 for 循环写为“for (Word word : sentence) {”，那么我会在编译时收到不兼容的类型错误消息。

score 2 · Accepted Answer

2

由于 Words 可以优雅地打印，一个简单的word.toString().equals(arg[1])就足够了。

于 2011-10-13T14:10:56.757 回答

score 2 · Accepted Answer

String可以通过调用方法访问内容：word()on edu.stanford.nlp.ling.Word; 例如

import edu.stanford.nlp.ling.Word;

List<Word> words = ...
for (Word word : words) {
  if (word.word().equals(args(1))) {
    System.err.println("Yes!");
  }
}

另请注意，在定义时最好使用泛型，List因为这意味着如果您尝试比较不兼容类型的类（例如Wordvs String），编译器或 IDE 通常会警告您。

编辑

原来我正在查看旧版本的 NLP API。查看最新的DocumentPreprocessor文档，我发现它实现Iterable<List<HasWord>>了HasWord定义word()方法。因此，您的代码应如下所示：

DocumentPreprocessor dp = ...
for (HasWord hw : dp) {
  if (hw.word().equals(args[1])) {
    System.err.println("Yes!");
  }
}

java - 关于在句子中查找单词的 Java 查询

2 回答 2

Related

Reference