我正在使用斯坦福 NLP 解析工具包。给定词典中的一个词,我怎样才能找到它的频率*?或者,给定一个频率等级,我如何确定相应的单词?
*在整个语言中,而不仅仅是文本示例。
这是我正在使用的工具包的演示:
class ParserDemo {
public static void main(String[] args) {
LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");
lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});
String[] sent = { "Sincerity", "may", "frighten", "the", "boy", "." };
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
parse.pennPrint();
System.out.println();
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
System.out.println(tdl);
System.out.println();
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}
}