- 如已接受的答案中所述,
flatten_json
这可能是一个不错的选择,具体取决于 JSON 的结构以及应如何展平结构。
- 在这种情况下,OP 希望 1 个事件的所有值都在一行上,因此
flatten_json
可以工作
- 如果期望的结果是每个位置
positions
都有单独的行,那么pandas.json_normalize
是更好的选择。
- 一个问题
flatten_json
是,如果有很多positions
,那么每个事件的列数events
可能非常大。
- 请参阅如何使用 flatten_json 递归地展平嵌套的 JSON?如果使用
flatten_json
.
为每个dict
in创建 1 行events
data = {'events': [{'id': 142896214,
'playerId': 37831,
'teamId': 3157,
'matchId': 2214569,
'matchPeriod': '1H',
'eventSec': 0.8935539999999946,
'eventId': 8,
'eventName': 'Pass',
'subEventId': 85,
'subEventName': 'Simple pass',
'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}]}
创建数据框
df = pd.DataFrame.from_dict(data)
df = df['events'].apply(pd.Series)
压扁positions
_pd.Series
df_p = df['positions'].apply(pd.Series)
df_p_0 = df_p[0].apply(pd.Series)
df_p_1 = df_p[1].apply(pd.Series)
重命名positions[0]
& positions[1]
:
df_p_0.columns = ['pos_0_x', 'pos_0_y']
df_p_1.columns = ['pos_1_x', 'pos_1_y']
扁平tags
化pd.Series
:
df_t = df.tags.apply(pd.Series)
df_t = df_t[0].apply(pd.Series)
df_t_t = df_t.tag.apply(pd.Series)
重命名id
& label
:
df_t = df_t.rename(columns={'id': 'tags_id'})
df_t_t.columns = ['tags_tag_label']
将它们全部与pd.concat
:
df_new = pd.concat([df, df_p_0, df_p_1, df_t.tags_id, df_t_t], axis=1)
删除旧列:
df_new = df_new.drop(['positions', 'tags'], axis=1)
为每个位置创建一个单独的行positions
# normalize events
df = pd.json_normalize(data, 'events')
# explode all columns with lists of dicts
df = df.apply(lambda x: x.explode()).reset_index(drop=True)
# list of columns with dicts
cols_to_normalize = ['positions', 'tags']
# if there are keys, which will become column names, overlap with excising column names
# add the current column name as a prefix
normalized = list()
for col in cols_to_normalize:
d = pd.json_normalize(df[col], sep='_')
d.columns = [f'{col}_{v}' for v in d.columns]
normalized.append(d.copy())
# combine df with the normalized columns
df = pd.concat([df] + normalized, axis=1).drop(columns=cols_to_normalize)
# display(df)
id playerId teamId matchId matchPeriod eventSec eventId eventName subEventId subEventName positions_x positions_y tags_id tags_tag_label
0 142896214 37831 3157 2214569 1H 0.893554 8 Pass 85 Simple pass 51 49 1801 accurate
1 142896214 37831 3157 2214569 1H 0.893554 8 Pass 85 Simple pass 40 53 1801 accurate