0

我正在学习谷歌云自然语言处理 API。API基础页面说明analyze_syntax()方法的响应应该是

  • 句子“清单”(附文字和分析)
  • 令牌的“列表”(带文本和分析)

请参考这个 -句法分析基础

相反,我收到的输出为:

sentences {
  text {
    content: "Once again i am typing a sentence to see if it finally return a proper value."
  }
}

sentences {
  text {
    content: "The problem is that offsets are -1 for all tokens which is not proper."
    begin_offset: 78
  }
}

tokens {
  text {
    content: "Once"
  }
  part_of_speech {
    tag: ADV
  }
  dependency_edge {
    head_token_index: 1
    label: ADVMOD
  }
  lemma: "Once"
}

tokens {
  text {
    content: "again"
    begin_offset: 5
  }
  part_of_speech {
    tag: ADV
  }
  dependency_edge {
    head_token_index: 4
    label: ADVMOD
  }
  lemma: "again"
}

tokens {
  text {
    content: "i"
    begin_offset: 11
  }
  part_of_speech {
    tag: PRON
    case: NOMINATIVE
    number: SINGULAR
    person: FIRST
  }
  dependency_edge {
    head_token_index: 4
    label: NSUBJ
  }
  lemma: "i"
}

注意没有

  • 句子的“列表”,每一个都被分析
  • 标记的“列表”,每一个都被分析

但是每个句子,每个单词都经过单独处理。为什么我的结果与图示的结果不同?

这是实际的代码。

import os
# import argparse

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types


os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\user\\Downloads\\test-ee23cf382897.json"  

def analyze(user_said):
    """Changed to suit my needs"""
    client = language.LanguageServiceClient()

    document = types.Document(content=user_said, type=enums.Document.Type.PLAIN_TEXT)
    syntax = client.analyze_syntax(document=document, encoding_type='UTF8')


    print(syntax)

    with open('syntax_analysis.txt', 'w') as file:
        file.write(str(syntax))

#
# if __name__ == '__main__':
#     parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
#     parser.add_argument('user_said', help='The filename of the movie review you would like to analyze.')
#     args = parser.parse_args()
#     analyze(args.user_said)

附加信息:

  • Python 3.6.5
  • PyCharm 社区版 2018.1
4

1 回答 1

1

我发现发生了什么事。要获取令牌的“列表”,请执行以下操作。

# My Original Code(Perhaps even you)
syntax = client.analyze_syntax(document=document, encoding_type='UTF8')

# Changes to do    
tokens = whatever_name_of_client.analyze_syntax(some_args).tokens

在基础页面上,谷歌似乎正在说明这种方法的功能。不是它实际上会如何返回。这是我做出改变后得到的回应

# The square brackets :)
[text {   
  content: "Google"
}
part_of_speech {
  tag: NOUN
  number: SINGULAR
  proper: PROPER
}
dependency_edge {
  head_token_index: 3
  label: NSUBJ
}
lemma: "Google"
, text {
  content: "certainly"
  begin_offset: 7
}
part_of_speech {
  tag: ADV
}
dependency_edge {
  head_token_index: 3
  label: ADVMOD
}
lemma: "certainly"
, text {
  content: "should"
  begin_offset: 17
}
part_of_speech {
  tag: VERB
}
dependency_edge {
  head_token_index: 3
  label: AUX
}
lemma: "should"
, text {
  content: "make"
  begin_offset: 24
}
part_of_speech {
  tag: VERB
}
dependency_edge {
  head_token_index: 3
  label: ROOT
}
lemma: "make"
, text {
  content: "better"
  begin_offset: 29
}
part_of_speech {
  tag: ADJ
}
dependency_edge {
  head_token_index: 5
  label: AMOD
}
lemma: "good"
, text {
  content: "documentation"
  begin_offset: 36
}
part_of_speech {
  tag: NOUN
  number: SINGULAR
}
dependency_edge {
  head_token_index: 3
  label: DOBJ
}
lemma: "documentation"
, text {
  content: "."
  begin_offset: 49
}
part_of_speech {
  tag: PUNCT
}
dependency_edge {
  head_token_index: 3
  label: P
}
lemma: "."
, text {
  content: "I"
  begin_offset: 51
}
part_of_speech {
  tag: PRON
  case: NOMINATIVE
  number: SINGULAR
  person: FIRST
}
dependency_edge {
  head_token_index: 8
  label: NSUBJ
}
lemma: "I"
, text {
  content: "had"
  begin_offset: 53
}
part_of_speech {
  tag: VERB
  mood: INDICATIVE
  tense: PAST
}
dependency_edge {
  head_token_index: 8
  label: ROOT
}
lemma: "have"
, text {
  content: "to"
  begin_offset: 57
}
part_of_speech {
  tag: PRT
}
dependency_edge {
  head_token_index: 11
  label: AUX
}
lemma: "to"
, text {
  content: "really"
  begin_offset: 60
}
part_of_speech {
  tag: ADV
}
dependency_edge {
  head_token_index: 11
  label: ADVMOD
}
lemma: "really"
, text {
  content: "try"
  begin_offset: 67
}
part_of_speech {
  tag: VERB
}
dependency_edge {
  head_token_index: 8
  label: XCOMP
}
lemma: "try"
, text {
  content: "out"
  begin_offset: 71
}
part_of_speech {
  tag: PRT
}
dependency_edge {
  head_token_index: 11
  label: PRT
}
lemma: "out"
, text {
  content: "stuff"
  begin_offset: 75
}
part_of_speech {
  tag: NOUN
  number: SINGULAR
}
dependency_edge {
  head_token_index: 11
  label: DOBJ
}
lemma: "stuff"
, text {
  content: "over"
  begin_offset: 81
}
part_of_speech {
  tag: ADP
}
dependency_edge {
  head_token_index: 11
  label: PREP
}
lemma: "over"
, text {
  content: "their"
  begin_offset: 86
}
part_of_speech {
  tag: PRON
  case: GENITIVE
  number: PLURAL
  person: THIRD
}
dependency_edge {
  head_token_index: 16
  label: POSS
}
lemma: "their"
, text {
  content: "website"
  begin_offset: 92
}
part_of_speech {
  tag: NOUN
  number: SINGULAR
}
dependency_edge {
  head_token_index: 14
  label: POBJ
}
lemma: "website"
, text {
  content: "."
  begin_offset: 99
}
part_of_speech {
  tag: PUNCT
}
dependency_edge {
  head_token_index: 8
  label: P
}
lemma: "."
, text {
  content: "What"
  begin_offset: 101
}
part_of_speech {
  tag: PRON
  person: THIRD
}
dependency_edge {
  head_token_index: 19
  label: ATTR
}
lemma: "What"
, text {
  content: "is"
  begin_offset: 106
}
part_of_speech {
  tag: VERB
  mood: INDICATIVE
  number: SINGULAR
  person: THIRD
  tense: PRESENT
}
dependency_edge {
  head_token_index: 19
  label: ROOT
}
lemma: "be"
, text {
  content: "a"
  begin_offset: 109
}
part_of_speech {
  tag: DET
}
dependency_edge {
  head_token_index: 21
  label: DET
}
lemma: "a"
, text {
  content: "car"
  begin_offset: 111
}
part_of_speech {
  tag: NOUN
  number: SINGULAR
}
dependency_edge {
  head_token_index: 19
  label: NSUBJ
}
lemma: "car"
, text {
  content: "though"
  begin_offset: 115
}
part_of_speech {
  tag: ADV
}
dependency_edge {
  head_token_index: 19
  label: ADVMOD
}
lemma: "though"
, text {
  content: "?"
  begin_offset: 121
}
part_of_speech {
  tag: PUNCT
}
dependency_edge {
  head_token_index: 19
  label: P
}
lemma: "?"
]
于 2018-06-29T07:08:59.457 回答