c++ - faliure in reading training data: tagger.cpp (393) CRF++

Question

While I am running CRF++ on my training data (train.txt) I have got the follwoing error

C:\Users\2012\Desktop\CRF_Software_Package\CRF++-0.58>crf_learn template train.d
ata model
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.

reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)]
0.00 s

My training data contains Unicode characters and the data is saved using Notepad (encoding= Unicode big indian)

I am not sure If the problem with the template or with the format of the training data. How can I check the format of the training data?

score 3 · Accepted Answer

我认为这是因为您的模板文件。请检查您是否已将最后一列作为黄金标准作为培训特征。列索引从 0 开始。例如，如果您的 BIO 文件中有 6 列。模板不应包含类似 %x[0,5] 的内容

score 0 · Accepted Answer

问题在于模板文件检查您的功能是否存在不正确的“语法”，即 U10:%x[-1,0]/% [0,0]

您意识到在第二个 % 之后缺少“x”，更正后的行应该类似于 U10:%x[-1,0]/%x[0,0] 下面的行

score 0 · Accepted Answer

问题不在于 Unicode 编码，而在于模板文件。

看看这个类似的Q：The failure in using CRF+0.58 train NE Model

score 0 · Accepted Answer

我有同样的问题，文件是 UTF-8，模板文件和训练文件的格式肯定是正确的。原因是 CRFPP 预计输入文件中最多有 1024 列。如果在这种情况下它会输出适当的错误消息，那就太好了。

c++ - faliure in reading training data: tagger.cpp (393) CRF++

4 回答 4

Related

Reference