我有一些代码总结了包含著名的泰坦尼克号数据集的 DataFrame,如下所示:
titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100],
labels=['child', 'adolescent', 'adult', 'senior'])
titanic.groupby(['agecat', 'pclass','sex']
)['survived'].mean()
这会根据调用生成以下带有 MultiIndex 的 DataFrame groupby
:
agecat pclass sex
adolescent 1 female 1.000000
male 0.200000
2 female 0.923077
male 0.117647
3 female 0.542857
male 0.125000
adult 1 female 0.965517
male 0.343284
2 female 0.868421
male 0.078125
3 female 0.441860
male 0.159184
child 1 female 0.000000
male 1.000000
2 female 1.000000
male 1.000000
3 female 0.483871
male 0.324324
senior 1 female 1.000000
male 0.142857
2 male 0.000000
3 male 0.000000
Name: survived, dtype: float64
但是,我希望agecat
MultiIndex 的级别自然排序,而不是按字母顺序排列,即:['child', 'adolescent', 'adult', 'senior']
. 但是,如果我尝试使用reindex
这样做:
titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
['child', 'adolescent', 'adult', 'senior'], level='agecat')
它对生成的 DataFrame 的 MultiIndex 没有任何影响。这应该有效,还是我使用了错误的方法?