我正在根据在线手册(http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html)构建一个 15k 行的训练数据文档,名为:en-ner-person.train。
我的问题是:在我的培训文档中,我是否包含完整的报告?还是我只包括具有名称的行:<START:person> John Smith <END>
?
因此,例如,我是否在训练数据中使用整个报告:
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
A nonexecutive director has many similar responsibilities as an executive director.
However, there are no voting rights with this position.
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
还是我只在我的培训文档中包含这两行:
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .