0

我有一个看起来像的数据框

user    time15min             name                  is_purchase
A       2015-08-18 16:45:00   Words With Friends    0
A       2015-08-18 16:45:00   Clash of Clans        0
A       2015-08-18 16:45:00   Words With Friends    0
A       2015-08-18 16:45:00   Clash of Clans        1
A       2015-08-18 17:00:00   Sudoku                0
B       2015-08-18 17:00:00   Angry Birds           0
B       2015-08-18 17:00:00   Candy Crush           0
B       2015-08-18 17:00:00   Candy Crush           0
....

time15min列包含用户在手机中玩游戏的 15 分钟存储桶。

我需要做的是为每个用户和每个 time15min 时段创建一个聚合数据框,其中有一列显示玩得最多的游戏以及在此期间是否有任何应用内购买。

所以,结果就像

 user   time15min             name                  purchase_made
  A     2015-08-18 16:45:00   Clash of Clans        1
  A     2015-08-18 17:00:00   Sudoku                0
  B     2015-08-18 17:00:00   Candy Crush           0 

如果 A 的第一种情况出现平局,我们可以只取第一个字母顺序的平局(在这种情况下是 Clash of Clans)。

4

1 回答 1

3

你可以从这里应用食谱

import pandas as pd
## read in your data from clipboard and get the columns right
df = pd.read_clipboard(sep='\s{2,}')

df.loc[:,'time15min'] = pd.to_datetime(df['time15min'])

## set the index to time15min, so df2 has a DateTimeIndex
df2 = df.set_index('time15min')

## Use .agg to count the names and total the purchases
df3=df2.groupby(['user',pd.TimeGrouper('15min'),'name']).agg({
                           'name':'count','is_purchase':'sum'})

## Create a mask to find the max for each group
mask = df3.groupby(level=[0,1]).agg('idxmax')
df3_count = df3.loc[mask['name']]

df3_count

这给出了以下结果:

                                           name is_purchase
user    time15min           name        
A   2015-08-18 16:45:00     Clash of Clans  2   1
    2015-08-18 17:00:00     Sudoku          1   0
B   2015-08-18 17:00:00     Candy Crush     2   0
于 2015-09-02T11:50:33.390 回答