0

Doccano 需要 JSONL 文件格式的文本。

这不适用于 json.dumps ......至少不能直接使用。它要么不给出双引号(这是必需的),要么有一些 Doccano 不接受的奇怪格式。

{"text": "EU rejects German call to boycott British lamb.", "label": [ [0, 2, "ORG"], ... ]}
{"text": "Peter Blackburn", "label": [ [0, 15, "PERSON"] ]}
{"text": "President Obama", "label": [ [10, 15, "PERSON"] ]}

有小费吗?

4

1 回答 1

0

这对我有用

import json

notes = zip(
    df.NOTE_TEXT_CONCATINATED, [[]] * df.shape[0], df.NOTE_ID
) # [[]] field could be pre-filled with whatever labels you need to show up


fname = "/Users/apwork/Downloads/test_json.jsonl"

a = [u"text", u"label", u"NOTE_ID"]  # u"NOTE_ID" is for the Metadata field in Doccano.

jsonfile = open(fname, "w")

for row in example:
    json.dump(dict(zip(a, row)), jsonfile)
    jsonfile.write("\n")
jsonfile.close()
于 2021-06-10T00:45:30.217 回答