所以我在 jsonl 文件列中有这个嵌套的多个字典,如下所示:
`df['referenced_tweets'][0]`
生产(缩短产量)
'id': '1392893055112400898',
'public_metrics': {'retweet_count': 0,
'reply_count': 1,
'like_count': 2,
'quote_count': 0},
'conversation_id': '1392893055112400898',
'created_at': '2021-05-13T17:22:37.000Z',
'reply_settings': 'everyone',
'entities': {'annotations': [{'start': 65,
'end': 77,
'probability': 0.9719000000000001,
'type': 'Person',
'normalized_text': 'Jill McMillan'}],
'mentions': [{'start': 23,
'end': 36,
'username': 'usasklibrary',
'protected': False,
'description': 'The official account of the University Library at USask.',
'created_at': '2019-06-04T17:19:12.000Z',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'url': '*removed*',
'expanded_url': 'http://library.usask.ca',
'display_url': 'library.usask.ca'}]}},
'name': 'University Library',
'url': '....',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1278828446026629120/G1w7t-HK_normal.jpg',
'verified': False,
'id': '1135959197902921728',
'public_metrics': {'followers_count': 365,
'following_count': 119,
'tweet_count': 556,
'listed_count': 9}}]},
'text': 'Wonderful session with @usasklibrary Graduate Writing Specialist Jill McMillan who is walking SURE students through the process of organizing/analyzing a literature review! So grateful to the library -- our largest SURE: Student Undergraduate Research Experience partner!',
...
我的意图是创建一个函数,该函数将自动提取整个数据框(而不仅仅是一行)中的特定列(例如文本、类型)。所以我写了这个函数:
### x = df['referenced_tweets']
def extract_TextType(x):
dic = {}
for i in x:
if i != " ":
new_df= pd.DataFrame.from_dict(i)
dic['refd_text']=new_df['text']
dic['refd_type'] = new_df['type']
else:
print('none')
return dic
但是运行该功能:
df['referenced_tweets'].apply(extract_TextType)
产生错误:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
重点是从原始“引用推文”列中提取这两个嵌套列(文本和类型),并将它们与原始行匹配。
请问我在做什么错?