0

我使用 AWS Comprehend 来训练 NLP 模型。测试集上的预测运行成功,但输出文件的行数多于输入:

输入:1000 行

输出:2082行

输出如下所示:

predictions.json <...>
{"File": "test.csv", "Line": "0", "Classes": [{"Name": "No", "Score": 0.7022}, {"Name": "Yes", "Score": 0.2892}, {"Name": "tag", "Score": 0.0086}]}
{"File": "test.csv", "Line": "1", "Classes": [{"Name": "No", "Score": 0.6252}, {"Name": "Yes", "Score": 0.3747}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2", "Classes": [{"Name": "No", "Score": 0.9295}, {"Name": "Yes", "Score": 0.0705}, {"Name": "tag", "Score": 0.0}]}
{"File": "test.csv", "Line": "3", "Classes": [{"Name": "No", "Score": 0.5247}, {"Name": "Yes", "Score": 0.4753}, {"Name": "tag", "Score": 0.0}]}
...
{"File": "test.csv", "Line": "2080", "Classes": [{"Name": "No", "Score": 0.8528}, {"Name": "Yes", "Score": 0.1471}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2081", "Classes": [{"Name": "No", "Score": 0.5318}, {"Name": "Yes", "Score": 0.4682}, {"Name": "tag", "Score": 0.0}]}

谁能帮助我如何使用输出?

4

3 回答 3

2

我遇到了同样的问题。在我的情况下,错误是因为预测文件(在你的情况下为 Test.csv)不是指定的编码。AWS Comprehend 需要 - “UTF-8”编码。
AWS 文档链接

于 2020-01-30T14:08:21.823 回答
0

一种选择是将每个句子拆分到不同的文件中,并将整个文件夹用作测试集,修复选项:

 "InputFormat": "ONE_DOC_PER_FILE"

其他选项是尝试找出数据集中有多少个“/n”,错误可能是这个。

于 2019-09-02T06:39:55.000 回答
0

\r就我而言,除了 UTF-8 之外,文本中还存在回车符。

于 2021-11-11T17:15:38.023 回答