0

我有一个像这样的 Pandas DataFrame:

sentences = ['First sentence. Second sentence', 'Third sentence. Fourth sentence']
df = pd.DataFrame(sentences, columns =['text_column'])

text_column
'First sentence. Second sentence'
'Third sentence. Fourth sentence'

接下来我把它放在一个像这样的jsonl(JSON LINE FORMAT)格式文件中(给doccano):

df.to_json(os.path.join(path,'test.jsonl'),orient='records', lines=True,force_ascii=False)

df 的 jsonl 输出:

{'text': 'First sentence. Second sentence'},
{'text': 'Third sentence. Fourth sentence'}

我想在字符串中的每个句子之间添加一个换行符,我尝试过这样的事情:

{'text': 'First sentence.' + "\\n \\n " +  'Second sentence'},
{'text': 'Third sentence.' + "\\n \\n " +  'Fourth sentence'}

但不起作用。也许我可以用 Pandas 格式化它。目标是为字符串中的每个短语换行,因为我首先显示 {'text': 'First sentence. 第二句'}在一页Doccano中。

预期输出:

First sentence.
Second sentence.

和:

Third sentence.
Fourth sentence.
4

1 回答 1

0

尝试这个:

sen = [{'text': 'First sentence. Second sentence'},{'text': 'Third sentence. Fourth sentence'}]

new_sen = []
for s in sen:
    for k , v in s.items():
        dct = {}
        dct[k] = ((v.split('.')[0])) + ("\n \n") + ((v.split('.')[1]))
    new_sen.append(dct)

print(new_sen)

输出:

[{'text': 'First sentence\n \n Second sentence'}, {'text': 'Third sentence\n \n Fourth sentence'}]

为了得到预期的输出试试这个:

print(new_sen[0]['text'])
# First sentence
# Second sentence
print(new_sen[1]['text'])
# Third sentence
# Fourth sentence
于 2021-08-28T10:31:13.203 回答