0

我有一个已经训练过的 Knowledge Studio 模型,它正在工作。我已将其部署在自然语言理解服务中。NLU 给出的实体和关系并不总是准确的,所以我试图让最终用户能够纠正提取信息中的错误,并根据他的反馈改进模型。

由于已经训练的模型可以导出到 WKS 的新实例,其内容(句子、单词和带注释的相关实体和关系)以易于理解的 JSON 格式构造;我想知道是否可以遵循相同的结构来标记新文档文本并将它们上传到 WKS 以反映此用户反馈,并希望改进模型。

4

1 回答 1

0

Well, I've found the answer by trying it. I downloaded the corpus from Knowledge Studio and analyzed the structure of the JSONs of each file (inside folder "./gt").

At the ending of each file, there are JSON entries for each entity previously annotated, so I used them as an example. For each entry, there's an id which has one value for the sentence number, and other for the mention number (both consecutive, starting from zero). The mention number restarts for every sentence, with each sentence being separated (at least as I could notice), by "\n", and also by ". " (note the space after "."). Also, each entry has a value for the character number at the beginning and at the end of the mention. When counting characters, the system does not take into account the "\" character. Here's an example of how it looks like.

{
"id" : "s3-m0", //id for the first mention in the fourth sentence
"properties" : {
  "SIRE_MENTION_TYPE" : "NONE",
  "SIRE_MENTION_CLASS" : "SPC",
  "SIRE_ENTITY_SUBTYPE" : "NONE",
  "SIRE_MENTION_ROLE" : "TEST_ENTITY"  // mention name
},
"type" : "TEST_ENTITY",  // mention name again
"begin" : 11, // beginning of the mention
"end" : 19,  // end of the mention
"inCoref" : false
}

If you are tagging a new mention (not previously included in the type system), you'll have to create it manually first. After adding this entry to each JSON, upload the modified corpus to Knowledge Studio, and create an annotation set with uploaded documents. Then, create a new task to annotate that new set, and you should see that the document is already annotated with the entries you added manually. So, the model is ready to be trained with these new examples, after submitting the documents and accepting the task. I think it should be similar for manually annotating relations.

Hope this helps someone else!

于 2019-02-23T17:22:56.343 回答