我正在使用 Pandas 的 qcut 为机器学习算法正确准备数据。我有带有价格的产品,我使用以下代码将我的数据离散化为大小相等的桶:
df['PriceBucket'] = pd.qcut(df['sell_prix'].sort_values(), 10, labels=False)
这个代码有更多关于我的标签的细节:
df['PriceBucketTitle'] = pd.qcut(df['sell_prix'].sort_values(), 10)
如下所示,我有 PriceBucket 和 PriceBucketTitle,它很完美!现在,我想要考虑到元素的数量。此代码返回 NaN 值(如下所示):
df['products_by_number'] = pd.qcut(df['sell_prix'], 10, labels=False).value_counts()
我知道如果我通过 PriceBucket 做一个 grouby 可能是可行的,但我想保留我的数据格式。这是结果:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] NaN
4669 8.0 2 (6.5, 8.5] NaN
4670 8.0 2 (6.5, 8.5] NaN
4671 8.0 2 (6.5, 8.5] NaN
4672 8.0 2 (6.5, 8.5] NaN
4673 8.0 2 (6.5, 8.5] NaN
4674 8.0 2 (6.5, 8.5] NaN
4675 8.0 2 (6.5, 8.5] NaN
4676 8.0 2 (6.5, 8.5] NaN
4677 8.0 2 (6.5, 8.5] NaN
11902 15.0 5 (12.9, 15] NaN
11903 15.0 5 (12.9, 15] NaN
11904 15.0 5 (12.9, 15] NaN
11905 15.0 5 (12.9, 15] NaN
11906 15.0 5 (12.9, 15] NaN
11907 15.0 5 (12.9, 15] NaN
11908 15.0 5 (12.9, 15] NaN
11909 15.0 5 (12.9, 15] NaN
11910 15.0 5 (12.9, 15] NaN
11911 15.0 5 (12.9, 15] NaN
12065 11.0 4 (10, 12.9] NaN
12066 11.0 4 (10, 12.9] NaN
例如,这就是我想要的:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] 984546.0
4669 8.0 2 (6.5, 8.5] 984546.0
4670 8.0 2 (6.5, 8.5] 984546.0
4671 8.0 2 (6.5, 8.5] 984546.0
4672 8.0 2 (6.5, 8.5] 984546.0
4673 8.0 2 (6.5, 8.5] 984546.0
4674 8.0 2 (6.5, 8.5] 984546.0
4675 8.0 2 (6.5, 8.5] 984546.0
4676 8.0 2 (6.5, 8.5] 984546.0
4677 8.0 2 (6.5, 8.5] 984546.0
11902 15.0 5 (12.9, 15] 1028141.0
11903 15.0 5 (12.9, 15] 1028141.0
11904 15.0 5 (12.9, 15] 1028141.0
11905 15.0 5 (12.9, 15] 1028141.0
11906 15.0 5 (12.9, 15] 1028141.0
11907 15.0 5 (12.9, 15] 1028141.0
11908 15.0 5 (12.9, 15] 1028141.0
11909 15.0 5 (12.9, 15] 1028141.0
11910 15.0 5 (12.9, 15] 1028141.0
11911 15.0 5 (12.9, 15] 1028141.0
12065 11.0 4 (10, 12.9] 48998.0
12066 11.0 4 (10, 12.9] 48998.0
帮助 ?谢谢!