0

我想知道 API 端点是否有任何方式允许analyzeSyntaxAPI响应 JSON 不包含字典的子属性(partOfSpeech如果它们是)*_UNKNOWN?在查看文档输入partOfSpeech详细信息时,我找不到任何方法来限制.

这是只有在响应后清理数据时才会处理的事情吗?

每个 API 文档的示例查询在一个名为的文件中request.json

{
  "encodingType": "UTF8",
  "document": {
    "type": "PLAIN_TEXT",
    "content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.  Sundar Pichai said in his keynote that users love their new Android phones."
  }
}

执行的命令:

curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \
  -s \
  -X POST \
  -H "Content-Type: application/json" \
  --data-binary @request.json > response.json

回复样本:

{
  "sentences": [
    {
      "text": {
        "content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
        "beginOffset": 0
      }
    },
    {
      "text": {
        "content": "Sundar Pichai said in his keynote that users love their new Android phones.",
        "beginOffset": 105
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Google",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 7,
        "label": "NSUBJ"
      },
      "lemma": "Google"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 6
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 0,
        "label": "P"
      },
      "lemma": ","
    },
...
...

此响应 JSON 为819行,其中314行(接近响应的 40%!)是属性*_UNKNOWNpartOfSpeech。因此,完全无用,但显着增加了响应中的数据量。

该文档似乎没有提供可以帮助解决此问题的参数。我是否遗漏了什么,或者这个 API 是否不支持删除这些密钥的参数*_UNKNOWN?这是否只能通过数据清理在响应后进行管理?

4

1 回答 1

2

如果我们查看 API 规范,我们最终会发现词性实际上是枚举(枚举)。例如,我们发现 Gender 可以是:

  • GENDER_UNKNOWN
  • 女性
  • 男性
  • 中性

进行 REST API 调用发送和接收 JSON 有效负载,JSON 对枚举的抽象是它们的值是扩展字符串。但是,REST 和 JSON 并不是发出 GCP 服务请求的唯一协议。也可以进行 gRPC 调用。当使用 gRPC 时,传输的协议是一个协议缓冲区。Google 提供的语言绑定允许您使用 gRPC 进行服务调用,而不必因学习该技术而分心。gRPC 的价值在于消息更小更快。

我没有看到在 API 级别适应传输压缩的机制(例如在使用 REST 时要求不包含在 JSON 响应中的字段)。

也可以看看:

于 2019-12-12T05:25:42.657 回答