我有两种不同的意图来捕获体温和氧饱和度值:
- 意图:(
body_temperature_data
包含body_temperature
自定义实体) - 意图:(
oxygen_saturation_data
包含oxygen_saturation
自定义实体)
下面是data/intents.yml
一段摘录:
- intent: body_temperature_data
examples: |
- sto bene
- niente febbre
- normale
- sono senza febbre
- non ho febbre
- non mi sento la febbre
- ho qualche linea
- ho la febbre
- credo di avere la febbre
- mi sento un po di febbre
- mi sento caldo
- poca
- molto poca
- bassa
- alta
- molto alta
- [35](body_temperature)
- [35.5](body_temperature)
- [35.6](body_temperature)
- [35.7](body_temperature)
- [35.8](body_temperature)
- [35.9](body_temperature)
- [35 e 9](body_temperature)
- [35,9](body_temperature)
- [36](body_temperature)
- [36.0](body_temperature)
- [36.1](body_temperature)
- [36.2](body_temperature)
- [36.3](body_temperature)
- [36.4](body_temperature)
- [36.5](body_temperature)
- [36.6](body_temperature)
- [36.7](body_temperature)
- [36.8](body_temperature)
- [36.9](body_temperature)
- [36 e 7](body_temperature)
- [36,9](body_temperature)
- [37](body_temperature)
- [37.0](body_temperature)
- [37.1](body_temperature)
- [37.2](body_temperature)
- [37.3](body_temperature)
- [37.4](body_temperature)
- [37.5](body_temperature)
- [37.6](body_temperature)
- [37.7](body_temperature)
- [37.8](body_temperature)
- [37.9](body_temperature)
- [37,2](body_temperature)
- [37.5](body_temperature)
- [37 , 6](body_temperature)
- [37 . 6](body_temperature)
- [38](body_temperature)
- [38.0](body_temperature)
- [38.1](body_temperature)
- [38.2](body_temperature)
- [38.3](body_temperature)
- [38.4](body_temperature)
- [38.5](body_temperature)
- [38.6](body_temperature)
- [38.7](body_temperature)
- [38.8](body_temperature)
- [38.9](body_temperature)
- [38 , 1](body_temperature)
- [38 . 2](body_temperature)
- [39](body_temperature)
- [39,1](body_temperature)
- [39.1](body_temperature)
- [39.2](body_temperature)
- [39.3](body_temperature)
- [39.5](body_temperature)
- [39.6](body_temperature)
- [39.7](body_temperature)
- [39.8](body_temperature)
- [39.9](body_temperature)
- [40](body_temperature)
- [41](body_temperature)
- [trentacinque](body_temperature)
- [trentasei](body_temperature)
- [trentasei e otto](body_temperature)
- [trentasette](body_temperature)
- [trentasette emmezzo](body_temperature)
- [trentasette e mezzo](body_temperature)
- [trentasette punto otto](body_temperature)
- [trentasette e quattro lineette](body_temperature)
- [trentasette e 6 linee](body_temperature)
- [trentasette virgola sei](body_temperature)
- [trentasette punto sette](body_temperature)
- [trentasette punto otto](body_temperature)
- [trentotto](body_temperature)
- [trentotto punto uno](body_temperature)
- [trentotto e 2 linee](body_temperature)
- [trentotto e due](body_temperature)
- [trentotto punto tre](body_temperature)
- [trentotto e quattro](body_temperature)
- [trentotto virgola quattro](body_temperature)
- [trentotto emmezzo](body_temperature)
- [trentanove](body_temperature)
- [trentanove e due](body_temperature)
- [trentanove emmezzo](body_temperature)
- [quaranta](body_temperature)
- [quarantuno](body_temperature)
- intent: oxygen_saturation_data
examples: |
- [70](oxygen_saturation)
- [71](oxygen_saturation)
- [72](oxygen_saturation)
- [73](oxygen_saturation)
- [74](oxygen_saturation)
- [75](oxygen_saturation)
- [76](oxygen_saturation)
- [77 e 9](oxygen_saturation)
- [78,7](oxygen_saturation)
- [79](oxygen_saturation)
- [80](oxygen_saturation)
- [80.5](oxygen_saturation)
- [81](oxygen_saturation)
- [81.6](oxygen_saturation)
- [82](oxygen_saturation)
- [82.4](oxygen_saturation)
- [83](oxygen_saturation)
- [83.7](oxygen_saturation)
- [84](oxygen_saturation)
- [84.1](oxygen_saturation)
- [85](oxygen_saturation)
- [85.2](oxygen_saturation)
- [86](oxygen_saturation)
- [86.9](oxygen_saturation)
- [87](oxygen_saturation)
- [87.8](oxygen_saturation)
- [88](oxygen_saturation)
- [88.0](oxygen_saturation)
- [88.1](oxygen_saturation)
- [89.0](oxygen_saturation)
- [89](oxygen_saturation)
- [89.7](oxygen_saturation)
- [90](oxygen_saturation)
- [90.0](oxygen_saturation)
- [90.1](oxygen_saturation)
- [90.2](oxygen_saturation)
- [90.3](oxygen_saturation)
- [90.4](oxygen_saturation)
- [90.5](oxygen_saturation)
- [90.6](oxygen_saturation)
- [90.7](oxygen_saturation)
- [90.8](oxygen_saturation)
- [90.9](oxygen_saturation)
- [91](oxygen_saturation)
- [91.6](oxygen_saturation)
- [92](oxygen_saturation)
- [92.9](oxygen_saturation)
- [93](oxygen_saturation)
- [93.8](oxygen_saturation)
- [94](oxygen_saturation)
- [94.5](oxygen_saturation)
- [95](oxygen_saturation)
- [95.4](oxygen_saturation)
- [96](oxygen_saturation)
- [96.7](oxygen_saturation)
- [97](oxygen_saturation)
- [97.5](oxygen_saturation)
- [98](oxygen_saturation)
- [98.4](oxygen_saturation)
- [99](oxygen_saturation)
- [99 e 1](oxygen_saturation)
- [99.0](oxygen_saturation)
- [99.9](oxygen_saturation)
- [100](oxygen_saturation)
- [settanta](oxygen_saturation)
- [settantuno](oxygen_saturation)
- [settantadue](oxygen_saturation)
- [settantatre](oxygen_saturation)
- [settantaquattro](oxygen_saturation)
- [settantacinque](oxygen_saturation)
- [settantasei](oxygen_saturation)
- [settantasette](oxygen_saturation)
- [settantotto](oxygen_saturation)
- [settantanove](oxygen_saturation)
- [ottanta](oxygen_saturation)
- [ottantuno](oxygen_saturation)
- [ottantadue](oxygen_saturation)
- [ottantatre](oxygen_saturation)
- [ottantatre punto cinque](oxygen_saturation)
- [ottantatre punto sei](oxygen_saturation)
- [ottantaquattro emmezzo](oxygen_saturation)
- [ottantaquattro e sei](oxygen_saturation)
- [ottantaquattro punto sette](oxygen_saturation)
- [ottantaquattro punto sei](oxygen_saturation)
- [ottantaquattro punto nove](oxygen_saturation)
- [ottantacinque](oxygen_saturation)
- [ottantacinque punto cinque](oxygen_saturation)
- [ottantacinque e quattro](oxygen_saturation)
- [ottantasei](oxygen_saturation)
- [ottantasette](oxygen_saturation)
- [ottantotto](oxygen_saturation)
- [ottantanove](oxygen_saturation)
- [novanta](oxygen_saturation)
- [novanta punto tre](oxygen_saturation)
- [novanta punto otto](oxygen_saturation)
- [novantuno](oxygen_saturation)
- [novantuno punto cinque](oxygen_saturation)
- [novantuno punto nove](oxygen_saturation)
- [novantadue](oxygen_saturation)
- [novantatre](oxygen_saturation)
- [novantatre e sei](oxygen_saturation)
- [novantatre punto sette](oxygen_saturation)
- [novantatre virgola otto](oxygen_saturation)
- [novantatre e due](oxygen_saturation)
- [novantatre punto uno](oxygen_saturation)
- [novantatre virgola nove](oxygen_saturation)
- [novantaquattro](oxygen_saturation)
- [novantaquattro virgola due](oxygen_saturation)
- [novantaquattro virgola otto](oxygen_saturation)
- [novantacinque](oxygen_saturation)
- [novantacinque e cinque](oxygen_saturation)
- [novantacinque punto cinque](oxygen_saturation)
- [novantasei](oxygen_saturation)
- [novantasei e uno](oxygen_saturation)
- [novantasei e cinque](oxygen_saturation)
- [novantasette](oxygen_saturation)
- [novantasette e due](oxygen_saturation)
- [novantasette e sei](oxygen_saturation)
- [novantotto](oxygen_saturation)
- [novantotto e cinque](oxygen_saturation)
- [novantanove](oxygen_saturation)
- [novantanove emmezzo](oxygen_saturation)
- [cento](oxygen_saturation)
如示例所示,我想获取表示为的实体值(以及随后的表单中的插槽)
- 数字形式的数字 (
35.5
),可能由用户在聊天消息通道上发短信 - 数字作为字母 (
trentacinque punto cinque
),可能通过语音输入,因此语音识别引擎通常会返回数字的文字记录。
看看如果我测试 RASA NLU 会发生什么:
$ rasa shell nlu --quiet
NLU model loaded. Type a message and press enter to parse it.
Next message:
90.3
{
"text": "90.3",
"intent": {
"id": -6401318193538980427,
"name": "body_temperature_data",
"confidence": 0.7580949664115906
},
"entities": [
{
"entity": "body_temperature",
"start": 0,
"end": 4,
"confidence_entity": 0.7051703929901123,
"value": "90.3",
"extractor": "DIETClassifier"
}
],
"intent_ranking": [
{
"id": -6401318193538980427,
"name": "body_temperature_data",
"confidence": 0.7580949664115906
},
{
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.18363729119300842
},
{
"id": -860430617479998517,
"name": "mood_unhappy",
"confidence": 0.010874141938984394
},
Next message:
novanta punto tre
{
"text": "novanta punto tre",
"intent": {
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.9999997615814209
},
"entities": [
{
"entity": "oxygen_saturation",
"start": 0,
"end": 17,
"confidence_entity": 0.9956320524215698,
"value": "novanta punto tre",
"extractor": "DIETClassifier"
}
],
"intent_ranking": [
{
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.9999997615814209
},
{
"id": -860430617479998517,
"name": "mood_unhappy",
"confidence": 4.465892544658345e-08
},
因此,如果将数字作为单词/字母插入,RASA 会正确分类 intentoxygen_saturation_data
和 entity oxygen_saturation
。到目前为止,一切都很好。
但如果我按数字插入数字(例如90.3
),则意图和实体分类错误。
这让我感到惊讶,因为示例集包含两个意图body_temperature
,并且oxygen_saturation
是两个完全分开的文本集!
我的问题是为什么意图/实体被错误分类?
顺便说一句,我尝试在示例中添加引号:
- ['35.5'](oxygen_saturation)
代替:
- [35.5](oxygen_saturation)
但这会在火车时引发此错误/警告:
/home/giorgio/.local/lib/python3.8/site-packages/rasa/shared/utils/io.py:97:用户警告:消息“35.5”中的实体注释未对齐,意图为“body_temperature_data”。确保训练数据中实体 ([(0, 6, "'35.5'")]) 的开始和结束值与标记边界 ([(0, 5, "'35.5")]) 匹配。常见原因:
- 实体包括尾随空格或标点符号
- 由于诸如中文之类的语言不使用空格进行单词分隔,标记器给出了意外的结果更多信息在https://rasa.com/docs/rasa/training-data-format#nlu-training-data
我的疑问是将数字(例如作为数字字符串的浮点数35.5
)作为实体(和意图示例)。这可能是 RASA NLU 失败的原因(请参阅上面的 rasa shell nlu 报告)?
任何想法?
$ cat config.yml
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: it
pipeline:
# pip3 install rasa[spacy]
# python3 -m spacy download it_core_news_sm
# python3 -m spacy download it_core_news_lg
# https://rasa.com/docs/rasa/components#spacynlp
- name: "SpacyNLP"
# language model to load
# italian large model: it_core_news_lg
# italian small model: it_core_news_sm
model: "it_core_news_sm"
# when retrieving word vectors, this will decide if the casing
# of the word is relevant. E.g. `hello` and `Hello` will
# retrieve the same vector, if set to `False`. For some
# applications and models it makes sense to differentiate
# between these two words, therefore setting this to `True`.
case_sensitive: false
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies:
谢谢