ibm-watson - NLC 或 R&R 的再训练方法

Question

我们知道的基本事实用于重新训练 NLC 或 R&R。

基本事实是问题级别的训练数据。

例如

“今天有多热？，温度”

问题“今天有多热？” 因此被归类为“温度”类。

应用程序启动后，将收到真实的用户问题。有些是相同的（即来自真实用户的问题与基本事实中的问题相同），有些是相似的术语，有些是新问题。假设应用程序有一个反馈循环来了解类（对于 NLC）或答案（对于 R&R）是否相关。

About the new questions, the approach seems to just add the them to the ground truth, which is then used to re-train the NLC/R&R?
For the questions with similar terms, do we just add them like the new questions, or do we just ignore them, given that similar terms can also be scored well even similar terms are not used to train the classifier?
In the case of the same questions, there seems nothing to do on the ground truth for NLC, however, to the R&R, are we just increase or decrease 1 for the relevance label in the ground truth?

简而言之，这里的主要问题是关于 NLC 和 R&R 的再培训方法是什么……

score 4 · Accepted Answer

一旦您的应用程序上线，您应该定期查看您的反馈日志以寻找改进的机会。对于 NLC，如果有文本被错误分类，那么您可以将这些文本添加到训练集中并重新训练以改进您的分类器。

只要您的分类器返回可接受的响应，就不必捕获类的每个可以想象的变化。

您可以使用日志中的其他类示例来组装训练集中没有的文本测试集。进行更改时运行此测试集将使您能够确定更改是否无意中导致了回归。您可以通过使用 REST 客户端调用分类器或通过 Beta 自然语言分类器工具包来运行此测试。

score 0 · Accepted Answer

A solid retraining approach should be getting feedback from live users. Your testing and validation of any retrained NLC (or R&R for that matter) should be guided by some of the principles that James Ravenscroft has outlined here (https://brainsteam.co.uk/2016/03/29/cognitive-quality-assurance-an-introduction/).

The answer by @davidgeorgeuk is correct, but fails to extend the thought to the conclusion that you are looking for. I would have a monthly set of activities where I would go through application logs where REAL users are indicating that your not classifying things correctly, and also incorporate any new classes to your classifier. I would retrain a second instance of NLC with the new data, and go through the test scenarios outlined above.

Once you are satisfied that you have IMPROVED your model, I would then switch my code to point at the new NLC instance, and the old NLC instance would be your "backup" instance, and the one that you would use for this exercise the next month. It's just applying a simple DevOps approach to managing your NLC instances. You could extend this to a development, QA, production scenario if you wanted.

ibm-watson - NLC 或 R&R 的再训练方法

2 回答 2

Related

Reference