我有大量事件被跟踪,每个事件都附加了时间戳:
我目前有下表:
ID Time_Stamp Event
1 2/20/2019 18:21 0
1 2/20/2019 19:46 0
1 2/21/2019 18:35 0
1 2/22/2019 11:39 1
1 2/22/2019 16:46 0
1 2/23/2019 7:40 0
2 6/5/2019 0:10 0
3 7/31/2019 10:18 0
3 8/23/2019 16:33 0
4 6/26/2019 20:49 0
我想要的是以下内容[但不确定是否可能]:
ID Time_Stamp Conversion Total_Duration_Days Conversion_Duration
1 2/20/2019 18:21 0 2.555 1.721
1 2/20/2019 19:46 0 2.555 1.721
1 2/21/2019 18:35 0 2.555 1.721
1 2/22/2019 11:39 1 2.555 1.721
1 2/22/2019 16:46 1 2.555 1.934
1 2/23/2019 7:40 0 2.555 1.934
2 6/5/2019 0:10 0 1.00 0.000
3 7/31/2019 10:18 0 23.260 0.000
3 8/23/2019 16:33 0 23.260 0.000
4 6/26/2019 20:49 0 1.00 0.000
对于 #1 总持续时间= Max Date - Min Date
[2.555 天]
对于 #2 Conversion Duration = Conversion Date - Min Date
[1.721 Days] - 转换后的以下操作可以保持在计算的持续时间
我尝试了以下方法:
df.reset_index(inplace=True)
df.groupby(['ID'])['Time_Stamp].diff().fillna(0)
这种做我想要的,但它显示了每个事件之间的差异,而不是最小时间戳到最大时间戳
conv_test = df.reset_index(inplace=True)
min_df = conv_test.groupby(['ID'])['visitStartTime_aest'].agg('min').to_frame('MinTime')
max_df = conv_test.groupby(['ID'])['visitStartTime_aest'].agg('max').to_frame('MaxTime')
conv_test = conv_test.set_index('ID').merge(min_df, left_index=True, right_index=True)
conv_test = conv_test.merge(max_df, left_index=True, right_index=True)
conv_test['Durartion'] = conv_test['MaxTime'] - conv_test['MinTime']
这给了我Total_Duration_Days
很棒的[随意提供更优雅的解决方案
关于我如何获得的任何想法Conversion_Duration
?