但是,我在尝试以正确格式获取数据时遇到了麻烦。
数据(供参考)必须采用以下格式:
[
{"name":"something 1","size":number 1,"imports":["thing 1","thing 2","thing 3","thing 4","thing 5","thing 6"]},
{"name":"something 2","size":number 2,"imports":["thing 1","thing 2","thing 3","thing 4","thing 5"]}
]
现在,我的数据(截至目前)采用以下格式(使用 nltk 收集):
[('would', 'MD'), ('said', 'VBD'), ('like', 'IN'), ('man', 'NN')]
使用 defaultdict,我能够使用这些行转换数据:
pos = [('would', 'MD'), ('said', 'VBD'), ('like', 'IN'), ('man', 'NN')]
d = defaultdict( list )
for a, b in pos:
d[b].append(a)
d = [ {b, d[b]} for b in d ]
至:
[
{'MD': ['would']},
{'NN': ['man']},
{'IN': ['like']},
{'VBD': ['would']}
]
我不太确定如何进行或如何以正确的格式获取它。任何帮助将不胜感激。谢谢!
编辑:我应该更清楚;我的预期输出是这样的:
[
{'name': 'man', 'POS':['MD']}
]