我有一个如下所示的数据框
df_yes = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,1,1,1],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03
12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-03
13:39:00','2173-04-04 11:30:00','2173-04-05 16:00:00','2173-04-05
22:00:00','2173-04-06 04:00:00','2173-04-06 04:30:00','2173-04-06
08:00:00'],
'val' :[5,5,5,5,1,6,5,5,8,3,4,6]
})
df_yes['time_1']= pd.to_datetime(df_yes['time_1'])
我想做的是在一天内得到一个特定值的countand 。cumduration我为此编写了以下代码
s=pd.to_timedelta(24,unit='h')-(df_yes.time_1-df_yes.time_1.dt.normalize())
df_yes['tdiff'] = df_yes.groupby(df_yes.time_1.dt.date).time_1.diff().shift(-1).fillna(s)
df_yes['t_d'] = df_yes['tdiff'].dt.total_seconds()/3600
df_yes['hr'] = df_yes['time_1'].dt.hour
df_yes['min'] = df_yes['time_1'].dt.minute
df_yes['date'] = df_yes['time_1'].dt.date
df_yes['day'] = pd.DatetimeIndex(df_yes['time_1']).day
## the below code is where I get the count and cum duration of a specific value in day for each hour
pd.DataFrame(df_yes.groupby(['date','hr','val'])['t_d'].agg({'cumduration':sum,'freq':'count'}).reset_index())
它产生如下所示的输出
如您所见,它弄乱了时间顺序。我的意思是出现的第一个值13th hour of 2173-04-03是,5但它显示为1. 如果您在我的数据框中看到时间信息,您就会有所了解。我没有使用minutegroup by 子句中的信息,因为它不允许我根据小时对值进行分组。希望这些信息有帮助
我希望我的输出是根据时间排序的。您可以看到它是如何根据时间组件排列的。

