microsoft-cognitive - 在标点符号 LUIS.ai 上禁用标记中断

Question

我正在使用 Microsoft 认知服务的语言理解服务 API，LUIS.ai。

每当 LUIS 解析文本时，总是会在标点符号周围插入空白标记。

根据文档，这种行为是故意的。

“英语、法语、意大利语、西班牙语：在任何空白处和任何标点符号周围插入记号符。”

对于我的项目，我需要保留原始查询字符串，但不包含这些标记，因为为我的模型训练的一些实体将包含标点符号，从解析的实体中去除多余的空格既烦人又有点笨拙。

此行为的示例：

有没有办法禁用它？会省不少力气。

谢谢！！

score 1 · Accepted Answer

不幸的是，目前无法禁用它，但好消息是返回的预测将处理原始字符串，而不是您在示例标记过程中看到的标记化字符串。

在如何理解 JSON 响应的文档中，您可以看到示例输出保留原始“查询”字符串，并且提取的实体"startIndex", "endIndex"在原始字符串中具有从零开始的字符索引 ()；这将允许您处理索引而不是解析的实体短语。

{
"query": "Book me a flight to Boston on May 4",
"intents": [
  {
    "intent": "BookFlight",
    "score": 0.919818342
  },
  {
    "intent": "None",
    "score": 0.136909246
  },
  {
    "intent": "GetWeather",
    "score": 0.007304534
  }
],
"entities": [
  {
    "entity": "boston",
    "type": "Location::ToLocation",
    "startIndex": 20,
    "endIndex": 25,
    "score": 0.621795356
  },
  {
    "entity": "may 4",
    "type": "builtin.datetime.date",
    "startIndex": 30,
    "endIndex": 34,
    "resolution": {
      "date": "XXXX-05-04"
    }
  }
]

}

microsoft-cognitive - 在标点符号 LUIS.ai 上禁用标记中断

1 回答 1

Related

Reference