编辑以显示原始数据帧的示例:
df.head(4)
shop category subcategory season
date
2013-09-04 abc weddings shoes winter
2013-09-04 def jewelry watches summer
2013-09-05 ghi sports sneakers spring
2013-09-05 jkl jewelry necklaces fall
我已经使用 get_dummies() 成功生成了以下数据框:
wedding_seasons = pd.get_dummies(df.loc[df['category']=='weddings',['category','season']],prefix = '', prefix_sep = '' )
wedding_seasons.head(3)
weddings winter summer spring fall
71654 1.0 0.0 1.0 0.0 0.0
72168 1.0 0.0 1.0 0.0 0.0
72080 1.0 0.0 1.0 0.0 0.0
上面的目标是帮助评估跨季节的婚礼频率,所以我曾经corr()生成以下结果:
weddings fall spring summer winter
weddings NaN NaN NaN NaN NaN
fall NaN 1.000000 0.054019 -0.331866 -0.012122
spring NaN 0.054019 1.000000 -0.857205 0.072420
summer NaN -0.331866 -0.857205 1.000000 -0.484578
winter NaN -0.012122 0.072420 -0.484578 1.000000
我不确定为什么婚礼栏会生成 NaN 值,但我的直觉是它源于我最初创建的方式wedding_seasons。任何指导将不胜感激,以便我可以正确评估列相关性。