parsing - 斯坦福解析器：frenchFactored.ser.gz

Question

我正在为法语使用斯坦福解析器（版本 3.6.0）。我的命令行是

java -cp stanford-parser.jar:* edu.stanford.nlp.parser.lexparser.LexicalizedParser -maxlength 30 -outputFormat conll2007 frenchFactored.ser.gz test_french.txt > test_french.conll10

但我没有得到输出中的函数，请参阅：

1 Je _ CLS CLS _ 2 NULL _ _

2 管理 _ VV _ 0 根 _ _

3 德 _ PP _ 2 空 _ _

4 个 _ NN _ 3 个 NULL _ _

5. _ PUNC PUNC _ 2 NULL _ _

我在命令行中会错过什么？

score 0 · Accepted Answer

您的命令没有任何问题：

已知的格式有：oneline、penn、latexTree、xmlTree、words、wordsAndTags、rootSymbolOnly、dependencies、typedDependencies、typedDependenciesCollapsed、collocations、semanticGraph、conllStyleDependencies、conll2007。最后两个都是制表符分隔值格式。后者有更多用下划线填充的列。[...]

资料来源： http: //nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreePrint.html

你可以试试另一个-outputFormat。

score 0 · Accepted Answer

Stanford CoreNLP 3.6.0 中有一个基于深度学习的法语依赖解析器。

在此处下载斯坦福 CoreNLP 3.6.0：

http://stanfordnlp.github.io/CoreNLP/download.html

还要确保获得该页面上也提供的法国模型罐。

然后运行这个命令来使用法语依赖解析器，确保你的 CLASSPATH 中有法语模型 jar：

java -Xmx6g -cp "*:stanford-corenlp-full-2015-12-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -file sample-french-document.txt -outputFormat text

score 0 · Accepted Answer

您的查询很好，但斯坦福解析器尚不支持此功能（版本 3.6.0）。

以下代码在使用法国模型时打印“false”。您正在使用的命令在内部对此进行检查，并在错误时悄悄地避免分析。

System.out.println(
  LexicalizedParser
    .loadModel("frenchFactored.ser.gz")
    .treebankLanguagePack()
    .supportsGrammaticalStructures()
);

这就是我使用 Malt 解析器 ( http://www.maltparser.org/ ) 的原因。

如果您喜欢以下输出：

1   Je      Je      C   CLS     null    2   suj     _   _
2   mange   mange   V   V       null    0   root    _   _
3   des     des     P   P       null    2   mod     _   _
4   pommes  pommes  N   N       null    3   obj     _   _
5   .       .       P   PUNC    null    2   mod     _   _

然后使用下面的代码生成它（不能简单地使用命令行）。我正在使用斯坦福和麦芽来完成这个：

LexicalizedParser lexParser = LexicalizedParser.loadModel("frenchFactored.ser.gz");
TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
ConcurrentMaltParserModel parserModel = ConcurrentMaltParserService.initializeParserModel(new File("fremalt-1.7.mco"));

Tokenizer<CoreLabel> tok = tokenizerFactory.getTokenizer(new StringReader("Je mange des pommes."));
List<CoreLabel> rawWords2 = tok.tokenize();
Tree parse = lexParser.apply(rawWords2);

// The malt parser requires token in the MaltTab format (Connll).
// Instead of using the Stanford tagger, we could have used Melt or another parser.
String[] tokens = parse.taggedLabeledYield().stream()
    .map(word -> {
        CoreLabel w = (CoreLabel)word;
        String lemma = Morphology.lemmatizeStatic(new WordTag(w.word(), w.tag())).word();
        String tag = w.value();

        return String.join("\t", new String[]{
            String.valueOf(w.index()+1),
            w.word(),
            lemma != null ? lemma : w.word(), 
            tag != null ? String.valueOf(tag.charAt(0)) : "_",
            tag != null ? tag : "_"
        });
    })
    .toArray(String[]::new);

ConcurrentDependencyGraph graph = parserModel.parse(tokens);
System.out.println(graph);

从那里，您可以使用以下方式以编程方式遍历图形：

graph.nTokenNodes()

如果您使用 Maven，只需将以下依赖项添加到您的 pom 中：

<dependency>
    <groupId>org.maltparser</groupId>
    <artifactId>maltparser</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.6.0</version>
</dependency>

奖金：进口

import org.maltparser.concurrent.ConcurrentMaltParserModel;
import org.maltparser.concurrent.ConcurrentMaltParserService;
import org.maltparser.concurrent.graph.ConcurrentDependencyGraph;
import org.maltparser.concurrent.graph.ConcurrentDependencyNode;
import org.maltparser.core.exception.MaltChainedException;

import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.WordTag;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.Morphology;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.process.Tokenizer;
import edu.stanford.nlp.process.TokenizerFactory;
import edu.stanford.nlp.trees.Tree;

额外：fremalt-1.7.mco 文件

http://www.maltparser.org/mco/french_parser/fremalt.html

parsing - 斯坦福解析器：frenchFactored.ser.gz

3 回答 3

Related

Reference