- 如果在上面的行(即重叠),您可以使用
shift()
创建组。start_time
greater than
end_time
- 我们这样
fillna
做'24:00:00'
是为了返回“True”作为第一个值,因为一天中没有什么可以超过 24 小时。那是因为如果我们不这样做NaN
,第一行的输出shift()
将返回。False
- 这将返回
boolean
一系列True
and False
(即1
和0
,. 分别),因此您只需使用cumsum
.
- 这将创建一个
grp
对象,我们可以将其包含在groupby
.
df = df.sort_values(by=['padel', 'start_time'], ascending=[True, True])
grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum()
df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
df['duration'] = ((pd.to_timedelta(df['end_time']) -
pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
Out[1]:
padel start_time end_time duration
0 Padel 10 08:00:00 09:00:00 60
1 Padel 10 10:00:00 13:00:00 180
2 Padel 10 16:00:00 22:00:00 360
带有输入数据框的完整代码
df = pd.DataFrame(pd.DataFrame({'padel': {38: 'Padel 10',
40: 'Padel 10',
42: 'Padel 10',
44: 'Padel 10',
46: 'Padel 10',
49: 'Padel 10',
51: 'Padel 10',
53: 'Padel 10',
55: 'Padel 10',
57: 'Padel 10',
59: 'Padel 10',
61: 'Padel 10',
63: 'Padel 10',
65: 'Padel 10',
67: 'Padel 10'},
'start_time': {38: '08:00:00',
40: '10:00:00',
42: '10:30:00',
44: '11:00:00',
46: '11:30:00',
49: '16:00:00',
51: '16:30:00',
53: '17:00:00',
55: '17:30:00',
57: '18:00:00',
59: '18:30:00',
61: '19:00:00',
63: '19:30:00',
65: '20:00:00',
67: '20:30:00'},
'end_time': {38: '09:00:00',
40: '11:30:00',
42: '12:00:00',
44: '12:30:00',
46: '13:00:00',
49: '17:30:00',
51: '18:00:00',
53: '18:30:00',
55: '19:00:00',
57: '19:30:00',
59: '20:00:00',
61: '20:30:00',
63: '21:00:00',
65: '21:30:00',
67: '22:00:00'},
'duration': {38: 60,
40: 90,
42: 90,
44: 90,
46: 90,
49: 90,
51: 90,
53: 90,
55: 90,
57: 90,
59: 90,
61: 90,
63: 90,
65: 90,
67: 90}}))
grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum()
df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
df['duration'] = ((pd.to_timedelta(df['end_time']) - \
pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
df