json 文件的结构如下:
{"text":"I","meta":{"paper_id":"cadf94cda790ae1bd90c32fbe441bb68a8637d83","title":"title1"}}
{"text":"love","meta":{"paper_id":"cadf94cda790ae1bd90c32fbe441bb68a8637d83","title":"title1"}}
{"text":"Coca-cola.","meta":{"paper_id":"cadf94cda790ae1bd90c32fbe441bb68a8637d83","title":"title1"}}
{"text":"He","meta":{"paper_id":"0f3402fa5b44e121d410ec73dfc21937074e5fa3","title":"title2"}}
{"text":"loves","meta":{"paper_id":"0f3402fa5b44e121d410ec73dfc21937074e5fa3","title":"title2"}}
{"text":"Pepsi.","meta":{"paper_id":"0f3402fa5b44e121d410ec73dfc21937074e5fa3","title":"title2"}}
我想连接属于同一篇论文(paper_id)的句子,最终有:
{"text":"I love Coca-cola. ","meta":{"paper_id":"cadf94cda790ae1bd90c32fbe441bb68a8637d83","title":"title1"}}
{"text":"He loves Pepsi.","meta":{"paper_id":"0f3402fa5b44e121d410ec73dfc21937074e5fa3","title":"title2"}}
任何想法如何解决这个问题?我坚持迭代那些嵌套字典。
将数据加载到列表中
data = [json.loads(line) for line in open('datafile_path', 'r')]
for sentence in data:
for key,dict_n in sentence.items():
for key2,value in dict_n.items():
print(value)
这会引发错误:AttributeError:“str”对象没有属性“items”