2

想象一下,我有一个包含用户事件的数据框

+---------+------------------+---------------------+
| user_id | event_name       | timestamp           |
+---------+------------------+---------------------+
| 1       | HomeAppear       | 2020-12-13 06:38:14 |
+---------+------------------+---------------------+
| 1       | TariffsAppear    | 2020-12-13 06:40:13 |
+---------+------------------+---------------------+
| 1       | CheckoutPayClick | 2020-12-13 06:50:12 |
+---------+------------------+---------------------+
| 2       | HomeAppear       | 2020-12-13 11:38:33 |
+---------+------------------+---------------------+
| 2       | TariffsAppear    | 2020-12-13 11:39:18 |
+---------+------------------+---------------------+

对于他最后一个(按时间戳)事件之后的每个用户,我想添加带有“结束”事件的新行,其时间戳与前一个事件相同:

+---------+------------------+---------------------+
| 1       | End              | 2020-12-13 06:50:12 |
+---------+------------------+---------------------+

我不知道该怎么做。在 SQL 中,我会使用 LAG() 或 LEAD() 来做到这一点。但是熊猫呢?

4

2 回答 2

3

用于DataFrame.drop_duplicates最后一行User_id,通过排序索引更改并添加event_nameEnd原始concat(添加最安全的排序mergesort):

#if necessary sorting
df = df.sort_values(['user_id', 'timestamp'], ignore_index=True)

df2 = df.drop_duplicates('user_id', keep='last').assign(event_name = 'End')

df = pd.concat([df, df2]).sort_index(kind='mergesort').reset_index(drop=True)
print (df)
   user_id        event_name            timestamp
0        1        HomeAppear  2020-12-13 06:38:14
1        1     TariffsAppear  2020-12-13 06:40:13
2        1  CheckoutPayClick  2020-12-13 06:50:12
3        1               End  2020-12-13 06:50:12
4        2        HomeAppear  2020-12-13 11:38:33
5        2     TariffsAppear  2020-12-13 11:39:18
6        2               End  2020-12-13 11:39:18
于 2021-02-11T08:44:38.657 回答
1

你可以做:

df = df.sort_values(['user_id', 'timestamp'])
df1=pd.DataFrame({'user_id':np.unique(df['user_id']),'event_name':'End','timestamp':np.NaN})
df=pd.concat([df,df1],axis=0).sort_values(by='user_id')
df['timestamp']=df['timestamp'].fillna(method='ffill')
于 2021-02-11T08:57:10.223 回答