1

我正在尝试编写代码以使用 python pandas 库根据值范围对数据集(来自 CSV)进行分类。可以使用聚合函数。但我正在努力使用聚合函数。

    +-------------+-------------+-------------+-------------+-------------+
    |Name         | Age         |Region       |Telephone    |Address      |
    +-------------+-------------+-------------+-------------+-------------+
    |             |             |             |             |             |

我可以开发以下代码。

import pandas as pd

data_frame = pd.read_csv('5000 Records.csv')

data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
                             bins=[-float('inf'),30,50,float('inf')],
                             labels=['above', 'in between', 'below'])

data_frame = data_frame.groupby(['Region','age_range']).agg(
    {
        'age_range': "count"
    }
)

print(data_frame)

但结果如下

                      age_range
Region    age_range            
Midwest   above             312
          in between        695
          below             390
Northeast above             201
          in between        421
          below             219
South     above             435
          in between        983
          below             452
West      above             211
          in between        443
          below             238

但要求是获得输出为:

+-------------+-------------+-------------+-------------+
|Region       | above         |in between |below        |
+-------------+-------------+-------------+-------------+
|             |             |             |             | 

有人可以帮我这样做吗?提前谢谢!

4

2 回答 2

2

Series.unstack与简化解决方案一起使用groupby- 删除agg和添加GroupBy.size

GroupBy.count用于排除缺失值的计数,这里两个解决方案的工作方式相同,因为age_range用于by参数 in groupby

df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)

或使用crosstab

df = pd.crosstab(data_frame['Region'], data_frame['age_range'])
于 2020-03-10T11:16:41.423 回答
0

尝试DataFrame.pivot方法:

data_frame.pivot(index='Region', columns='age_range', values='count')

于 2020-03-10T11:02:56.920 回答