parsing - 如何从斯坦福解析器 NLP 中获取想要的节点？

Question

我的主要问题是我不知道如何从 GrammaticalStructure 中提取节点。我在 java netbeans 中使用englishPCFG.ser。我的目标是了解屏幕的质量，例如：

iphone 4的屏幕很棒。

我想提取屏幕，很棒。如何提取 NN (screen) 和 VP (great) ？

我写的代码是：

LexicalizedParser lp = new LexicalizedParser("C:\\englishPCFG.ser");
lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});

String sent ="the screen is very good.";
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
parse.pennPrint();
System.out.println();

TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();

score 2 · Accepted Answer

该集合tdl是类型化依赖项的列表。对于这句话，它包含：

det(screen-2, the-1)
nsubj(great-7, screen-2)
amod(4-5, iphone-4)
prep_of(screen-2, 4-5)
cop(great-7, is-6)

（您可以通过在线尝试看到）。

因此，您想要的依赖nsubj(great-7, screen-2)项就在该列表中。 nsubj意味着“屏幕”是“伟大”的主题。

依赖的集合只是一个集合（List）。为了进行更复杂的进一步处理，人们通常希望将依赖关系制作成可以进行各种搜索和遍历的图结构。有多种方法可以做到这一点。我们经常使用 (jgrapht)[http://www.jgrapht.org/] 库。但这就是您自己编写的代码。

parsing - 如何从斯坦福解析器 NLP 中获取想要的节点？

1 回答 1

Related

Reference