nlp - RASA NLU-我想在单词之后提取任何内容（单词、数字或特殊字符）作为实体

Question

有没有办法我们可以在一个单词之后提取任何东西作为一个实体？例如：

我想在实体之后about或之后提取go to任何内容。learn

##intent:navigate
-I want to learn about linear regression
-I want to read about SVM
-I want to go to Python 2.6
-Take me to logistic regression: eval

##regex:topic
-^[A-Za-z0-9 :_ -][A-Za-z0-9 :_ -][A-Za-z0-9 :_ -]$

score 0 · Accepted Answer

是的，你可以，你必须在你的训练数据中定义实体，它会被模型提取出来。例如，在您的示例中，训练数据应该是这样的。

##intent:navigate
- I want to learn about [linear regression](topic)
- I want to talk about [RasaNLU](topic) for the rest of the day.
- I want to go to [Berlin](topic) for a specific work.
- I want to read about [SVM](topic)
- I want to go to [Python 2.6](topic)
- Take me to logistic regression: eval

在模型训练之后，我尝试了一个例子

Enter a message: I want to talk about SVM     
{
  "intent": {
    "name": "navigate",
    "confidence": 0.9576369524002075
  },
  "entities": [
    {
      "start": 21,
      "end": 24,
      "value": "SVM",
      "entity": "topic",
      "confidence": 0.8241770362411013,
      "extractor": "CRFEntityExtractor"
    }
  ]
}

但是要使此方法有效，您将必须定义更多具有所有可能模式的示例。就像示例“我想在剩下的时间里谈论 RasaNLU”。建议要提取的实体不必是句子的最后一个单词的模型（其余示例都是这种情况）。

score 0 · Accepted Answer

天真的方法可能非常简单 - 使用拆分字符串方法，例如

sentences = ["I want to learn about linear regression", "I want to read about SVM", "I want to go to Python 2.6",
 "Take me to logistic regression: eval"]

split_terms = ["about", "go to", "learn"]

for sentence in sentences:
    for split_term in split_terms:
        try:
            print(sentence.split(split_term)[1])
        except IndexError:
            pass # split_term was not found in a sentence

结果：

 linear regression
 about linear regression
 SVM
 Python 2.6

一个更聪明的方法可能是首先找到最后一个“拆分术语”来解决问题，学习 - 了解 - 关于

for sentence in sentences:
    last_split_term_index = 0
    last_split_term = ""
    for split_term in split_terms:
        last_split_term_index_candidate = sentence.find(split_term)
        if last_split_term_index_candidate > last_split_term_index:
            last_split_term_index = last_split_term_index_candidate
            last_split_term = split_term
    try:
        print(sentence.split(last_split_term)[1])

    except:
        continue

结果：

 linear regression
 SVM
 Python 2.6

nlp - RASA NLU-我想在单词之后提取任何内容（单词、数字或特殊字符）作为实体

2 回答 2

Related

Reference