parsing - 如何使用斯坦福依赖解析从文本文件中解析多个句子？

Question

我有一个包含很多行的文本文件，我想解析所有句子，但似乎我得到了所有句子，但只解析了第一句，不确定我在哪里犯了错误。

import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(  model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
txtfile =open('sample.txt',encoding="latin-1")
s=txtfile.read()
print(s)
result = dependency_parser.raw_parse(s)
for i in result:
print(list(i.triples()))

但它只给出第一句解析三倍而不是其他句子，有什么帮助吗？

'i like this computer'
'The great Buddha, the .....'
'My Ashford experience .... great experience.'


[[(('i', 'VBZ'), 'nsubj', ("'", 'POS')), (('i', 'VBZ'), 'nmod', ('computer', 'NN')), (('computer', 'NN'), 'case', ('like', 'IN')), (('computer', 'NN'), 'det', ('this', 'DT')), (('computer', 'NN'), 'case', ("'", 'POS'))]]

score 1 · Accepted Answer

您必须先拆分文本。您当前正在解析您发布的带有引号和所有内容的文字文本。从这部分解析结果可以看出这一点：("'", 'POS')

为此，您似乎可以ast.literal_eval在每一行上使用。请注意，撇号（用“不要”之类的词）会破坏格式，您必须自己处理撇号，例如line = line[1:-1]：

import ast
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(  model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

with open('sample.txt',encoding="latin-1") as f:
    lines = [ast.litral_eval(line) for line in f.readlines()]

for line in lines:
    parsed_lines = dependency_parser.raw_parse(line)

# now parsed_lines should contain the parsed lines from the file

score 0 · Accepted Answer

尝试：

from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

with open('sample.txt') as fin:
    sents = fin.readlines()
result = dep_parser.raw_parse_sents(sents)
for parse in results:
    print list(parse.triples())

请检查存储库中的文档字符串代码或演示代码以获取示例，它们通常很有帮助。

parsing - 如何使用斯坦福依赖解析从文本文件中解析多个句子？

2 回答 2

Related

Reference