我们知道的基本事实用于重新训练 NLC 或 R&R。
基本事实是问题级别的训练数据。
例如
“今天有多热?,温度”
问题“今天有多热?” 因此被归类为“温度”类。
应用程序启动后,将收到真实的用户问题。有些是相同的(即来自真实用户的问题与基本事实中的问题相同),有些是相似的术语,有些是新问题。假设应用程序有一个反馈循环来了解类(对于 NLC)或答案(对于 R&R)是否相关。
About the new questions, the approach seems to just add the them to the ground truth, which is then used to re-train the NLC/R&R?
For the questions with similar terms, do we just add them like the new questions, or do we just ignore them, given that similar terms can also be scored well even similar terms are not used to train the classifier?
In the case of the same questions, there seems nothing to do on the ground truth for NLC, however, to the R&R, are we just increase or decrease 1 for the relevance label in the ground truth?
简而言之,这里的主要问题是关于 NLC 和 R&R 的再培训方法是什么……