我有一个数据框列,data['time taken']
;
02:08:00
02:05:00
02:55:00
03:42:00
01:12:00
01:46:00
03:22:00
03:36:00
如何以分钟的形式获得输出,如下所示?
128
125
175
222
72
106
202
216
假设这是一个字符串列,您可以使用以下str.split
方法:
In [11]: df['time taken'].str.split(':')
Out[11]:
0 [02, 08, 00]
1 [02, 05, 00]
2 [02, 55, 00]
3 [03, 42, 00]
4 [01, 12, 00]
5 [01, 46, 00]
6 [03, 22, 00]
7 [03, 36, 00]
Name: time taken, dtype: object
然后使用apply
:
In [12]: df['time taken'].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))
Out[12]:
0 128
1 125
2 175
3 222
4 72
5 106
6 202
7 216
Name: time taken, dtype: int64
您可以尝试将其转换为DatetimeIndex
In [58]: time = pd.DatetimeIndex(df['time taken'])
In [59]: time.hour * 60 + time.minute
Out[59]: array([128, 125, 175, 222, 72, 106, 202, 216], dtype=int32)
有点hacky,因为我们不直接支持在timedeltas ATM中读取
In [9]: df = read_csv(StringIO(data),header=None)
In [10]: df
Out[10]:
0
0 02:08:00
1 02:05:00
2 02:55:00
3 03:42:00
4 01:12:00
5 01:46:00
6 03:22:00
7 03:36:00
Name: time, dtype: datetime64[ns]
In [13]: df['time'] = pd.to_datetime(df['time'])
In [18]: df['delta'] = df['time']-Timestamp('today')
In [19]: df
Out[19]:
time delta
0 2013-07-30 02:08:00 02:08:00
1 2013-07-30 02:05:00 02:05:00
2 2013-07-30 02:55:00 02:55:00
3 2013-07-30 03:42:00 03:42:00
4 2013-07-30 01:12:00 01:12:00
5 2013-07-30 01:46:00 01:46:00
6 2013-07-30 03:22:00 03:22:00
7 2013-07-30 03:36:00 03:36:00
In [20]: df.dtypes
Out[20]:
time datetime64[ns]
delta timedelta64[ns]
dtype: object
In [22]: df['delta'].apply(lambda x: x/np.timedelta64(1,'m'))
Out[22]:
0 128
1 125
2 175
3 222
4 72
5 106
6 202
7 216
Name: delta, dtype: float64