5

我正在尝试做回指解析,下面是我的代码。

首先,我导航到我下载了 stanford 模块的文件夹。然后我在命令提示符下运行命令来初始化 stanford nlp 模块

java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

之后,我在 Python 中执行以下代码

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

我想将句子更改为Tom is a smart boy. He know a lot of thing.PythonTom is a smart boy. Tom know a lot of thing.中没有教程或任何帮助。

我所能做的就是用 Python 中的以下代码进行注释

共指消解

output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

并通过解析 coref

coreferences = output['corefs']

我低于 JSON

coreferences

{u'1': [{u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 1,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [1, 1],
   u'sentNum': 1,
   u'startIndex': 1,
   u'text': u'Tom',
   u'type': u'PROPER'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 6,
   u'gender': u'MALE',
   u'headIndex': 5,
   u'id': 2,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [1, 2],
   u'sentNum': 1,
   u'startIndex': 3,
   u'text': u'a smart boy',
   u'type': u'NOMINAL'},
  {u'animacy': u'ANIMATE',
   u'endIndex': 2,
   u'gender': u'MALE',
   u'headIndex': 1,
   u'id': 3,
   u'isRepresentativeMention': False,
   u'number': u'SINGULAR',
   u'position': [2, 1],
   u'sentNum': 2,
   u'startIndex': 1,
   u'text': u'He',
   u'type': u'PRONOMINAL'}],
 u'4': [{u'animacy': u'INANIMATE',
   u'endIndex': 7,
   u'gender': u'NEUTRAL',
   u'headIndex': 4,
   u'id': 4,
   u'isRepresentativeMention': True,
   u'number': u'SINGULAR',
   u'position': [2, 2],
   u'sentNum': 2,
   u'startIndex': 3,
   u'text': u'a lot of thing',
   u'type': u'NOMINAL'}]}

对此有什么帮助吗?

4

3 回答 3

6

这是一种使用 CoreNLP 输出的数据结构的可能解决方案。提供了所有信息。这并不是一个完整的解决方案,可能需要扩展来处理所有情况,但这是一个很好的起点。

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')


def resolve(corenlp_output):
    """ Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
    for coref in corenlp_output['corefs']:
        mentions = corenlp_output['corefs'][coref]
        antecedent = mentions[0]  # the antecedent is the first mention in the coreference chain
        for j in range(1, len(mentions)):
            mention = mentions[j]
            if mention['type'] == 'PRONOMINAL':
                # get the attributes of the target mention in the corresponding sentence
                target_sentence = mention['sentNum']
                target_token = mention['startIndex'] - 1
                # transfer the antecedent's word form to the appropriate token in the sentence
                corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']


def print_resolved(corenlp_output):
    """ Print the "resolved" output """
    possessives = ['hers', 'his', 'their', 'theirs']
    for sentence in corenlp_output['sentences']:
        for token in sentence['tokens']:
            output_word = token['word']
            # check lemmas as well as tags for possessive pronouns in case of tagging errors
            if token['lemma'] in possessives or token['pos'] == 'PRP$':
                output_word += "'s"  # add the possessive morpheme
            output_word += token['after']
            print(output_word, end='')


text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
       "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)

这给出了以下输出:

Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.

如您所见,当代词具有句子开头(标题案例)先行词(最后一句中的“The big cat”而不是“the big cat”)时,此解决方案不处理纠正这种情况。这取决于先行词的类别——普通名词先行词需要小写,而专有名词先行词则不需要。可能需要一些其他的临时处理(至于我的测试句中的所有格)。它还假定您不想重用原始输出标记,因为它们已被此代码修改。解决此问题的一种方法是复制原始数据结构或创建新属性并print_resolved相应地更改函数。纠正任何分辨率错误也是另一个挑战!

于 2018-08-09T20:36:59.990 回答
3

我有类似的问题。在尝试使用核心 nlp 后,我使用神经 coref 解决了它。您可以使用以下代码通过神经 coref 轻松完成工作:

import spacy

nlp = spacy.load('en_coref_md')

doc = nlp(u'Phone area code will be valid only when all the below conditions are met. It cannot be left blank. It should be numeric. It cannot be less than 200. Minimum number of digits should be 3. ')

print(doc._.coref_clusters)

print(doc._.coref_resolved)

上述代码的输出是:
[Phone area code: [Phone area code, It, It, It]]

只有满足以下所有条件时,电话区号才有效。电话区号不能留空。电话区号应该是数字。电话区号不能小于 200。最小位数应为​​ 3。

为此,您将需要 spacy 以及可以是en_coref_mdoren_coref_lg或的英文模型en_coref_sm。您可以参考以下链接以获得更好的解释:

https://github.com/huggingface/neuralcoref

于 2018-07-13T04:29:07.147 回答
1
from stanfordnlp.server import CoreNLPClient
from nltk import tokenize

client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'parse', 'coref'], memory='4G', endpoint='http://localhost:9001')

def pronoun_resolution(text):

    ann = client.annotate(text)
    modified_text = tokenize.sent_tokenize(text)

    for coref in ann.corefChain:

        antecedent = []
        for mention in coref.mention:
            phrase = []
            for i in range(mention.beginIndex, mention.endIndex):
                phrase.append(ann.sentence[mention.sentenceIndex].token[i].word)
            if antecedent == []:
                antecedent = ' '.join(word for word in phrase)
            else:
                anaphor = ' '.join(word for word in phrase)
                modified_text[mention.sentenceIndex] = modified_text[mention.sentenceIndex].replace(anaphor, antecedent)

    modified_text = ' '.join(modified_text)

    return modified_text

text = 'Tom is a smart boy. He knows a lot of things.'
pronoun_resolution(text)

输出:“汤姆是个聪明的男孩。汤姆知道很多事情。

于 2019-08-12T16:34:11.520 回答