假设您希望使用 Python,这是我根据 Christopher Manning 的回答实现的解决方法。CoreNLP 的 Python 包装器没有实现“K-best 解析树”,因此替代方法是使用终端命令
java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt
请注意,您需要将 Stanford CoreNLP 和所有 JAR 文件下载到一个目录中,以及安装必备的 Python 库(请参阅导入语句)
import os
import subprocess
import nltk
from nltk.tree import ParentedTree
ip_sent = "a quick brown fox jumps over the lazy dog."
data_path = "<Your path>/stanford-corenlp-full-2018-10-05/data/testsent.txt" # Change the path of working directory to this data_path
with open(data_path, "w") as file:
file.write(ip_sent) # Write to the file specified; the text in this file is fed into the LexicalParser
os.chdir("/home/user/Sidney/Vignesh's VQA/SpElementEx/extLib/stanford-corenlp-full-2018-10-05") # Change the working directory to the path where the JAR files are stored
terminal_op = subprocess.check_output('java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 5 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt', shell = True) # Run the command via the terminal and capture the output in the form of bytecode
op_string = terminal_op.decode('utf-8') # Convert to string object
parse_set = re.split("# Parse [0-9] with score -[0-9][0-9].[0-9]+\n", op_string) # Split the output based on the specified pattern
print(parse_set)
# Print the parse trees in a pretty_print format
for i in parse_set:
parsetree = ParentedTree.fromstring(i)
print(type(parsetree))
parsetree.pretty_print()
希望这可以帮助。