0

我一直在玩谷歌的实体分析器,它看起来真的很棒!

但我一直在抨击这一点 - 我正在尝试复制下面的图片(在谷歌的自然语言 api 页面上看到)

在此处输入图像描述

这是我从请求中返回的实体数据的格式。

数据没有顺序,只有出现 - 所以遍历每个单词,检查实体似乎真的很慢,而且每个单词都有多个 - 它可能会有点复杂。

[
  {
  "mentions": [
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0.30000001192092896, "score":0.30000001192092896 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0.30000001192092896, "score":-0.30000001192092896 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    } 
  ],
  "metadata": {},
  "name": "group",
  "type": "ORGANIZATION",
  "salience": 0.34768930077552795,
  "sentiment": { "magnitude": 1.100000023841858, "score": 0 }
},
{
  "mentions": [
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0.10000000149011612, "score":-0.10000000149011612 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0.20000000298023224, "score": -0.20000000298023224 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth of Nations", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth\r\nOne", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  }
],
"metadata": {
  "mid": "/m/0j7v_",
  "wikipedia_url": "https://en.wikipedia.org/wiki/Commonwealth_of_Nations"
},
"name": "Commonwealth of Nations",
"type": "LOCATION",
"salience": 0.28001657128334045,
"sentiment": { "magnitude": 1.7000000476837158, "score": 0 }
 }, 
  ...
  ]

有没有一种简单的方法可以做到这一点,我完全错过了?感谢您的任何见解/想法。

奥利

4

1 回答 1

0

我相信beginOffset是你需要的:

beginOffset 指示给定文本中句子开始处的(从零开始的)字符偏移量。请注意,此偏移量是使用传递的 encodingType 计算的。

如果您在请求中指定EncodingType ,它应该可以工作。

如果未指定 EncodingType,则与编码相关的信息(例如 beginOffset)将设置为 -1。

于 2018-02-22T16:42:56.263 回答