我有以下数据框:
| start_time | end_time | id |
|---------------------|---------------------|-----|
| 2017-03-30 01:00:00 | 2017-03-30 01:15:30 |1 |
| 2017-03-30 02:02:00 | 2017-03-30 03:30:00 |4 |
| 2017-03-30 03:37:00 | 2017-03-30 03:39:00 |7 |
| 2017-03-30 03:41:30 | 2017-03-30 04:50:00 |8 |
| 2017-03-30 07:10:00 | 2017-03-30 07:10:30 |10 |
| 2017-03-30 07:11:00 | 2017-03-30 07:20:00 |13 |
| 2017-03-30 07:22:00 | 2017-03-30 08:00:00 |15 |
| 2017-03-30 10:00:00 | 2017-03-30 10:03:00 |20 |
当“i-1”行的 time_finish 在“i”行的 time_start 之前最多 900 秒时,我想将行分组到相同的 id 下。
基本上,上面示例的输出将是: 结果将是:
| start_time | end_time | id |
|---------------------|---------------------|-----|
| 2017-03-30 01:00:00 | 2017-03-30 01:15:30 |1 |
| 2017-03-30 02:02:00 | 2017-03-30 03:30:00 |4 |
| 2017-03-30 03:37:00 | 2017-03-30 03:39:00 |4 |
| 2017-03-30 03:41:30 | 2017-03-30 04:50:00 |4 |
| 2017-03-30 07:10:00 | 2017-03-30 07:10:30 |10 |
| 2017-03-30 07:11:00 | 2017-03-30 07:20:00 |10 |
| 2017-03-30 07:22:00 | 2017-03-30 08:00:00 |10 |
| 2017-03-30 10:00:00 | 2017-03-30 10:03:00 |20 |
我通过以下代码实现了它,但我确信有一种更优雅(和有效)的方式来做到这一点:
df['endTime_delayed'] = df.end_time.shift(1)
df['id_delayed'] = df['id'].shift(1)
for (i,row) in df.iterrows():
if (row.start_time-row.endTime_delayed).seconds <= 900 :
df.id.iloc[i] = df.id_delayed.iloc[i]
try :
df.id_delayed.iloc[i+1] = df.id.iloc[i]
except :
break