17

我有一个数据框列,data['time taken']

02:08:00
02:05:00
02:55:00
03:42:00
01:12:00
01:46:00
03:22:00
03:36:00

如何以分钟的形式获得输出,如下所示?

128
125
175
222
72
106
202
216
4

3 回答 3

16

假设这是一个字符串列,您可以使用以下str.split方法:

In [11]: df['time taken'].str.split(':')
Out[11]:
0    [02, 08, 00]
1    [02, 05, 00]
2    [02, 55, 00]
3    [03, 42, 00]
4    [01, 12, 00]
5    [01, 46, 00]
6    [03, 22, 00]
7    [03, 36, 00]
Name: time taken, dtype: object

然后使用apply

In [12]: df['time taken'].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))
Out[12]:
0    128
1    125
2    175
3    222
4     72
5    106
6    202
7    216
Name: time taken, dtype: int64
于 2013-07-30T16:06:20.087 回答
11

您可以尝试将其转换为DatetimeIndex

In [58]: time = pd.DatetimeIndex(df['time taken'])

In [59]: time.hour * 60 + time.minute
Out[59]: array([128, 125, 175, 222,  72, 106, 202, 216], dtype=int32)
于 2013-07-30T16:08:29.527 回答
3

有点hacky,因为我们不直接支持在timedeltas ATM中读取

In [9]: df = read_csv(StringIO(data),header=None)

In [10]: df
Out[10]: 
          0
0  02:08:00
1  02:05:00
2  02:55:00
3  03:42:00
4  01:12:00
5  01:46:00
6  03:22:00
7  03:36:00
Name: time, dtype: datetime64[ns]

In [13]: df['time'] = pd.to_datetime(df['time'])

In [18]: df['delta'] = df['time']-Timestamp('today')

In [19]: df
Out[19]: 
                 time    delta
0 2013-07-30 02:08:00 02:08:00
1 2013-07-30 02:05:00 02:05:00
2 2013-07-30 02:55:00 02:55:00
3 2013-07-30 03:42:00 03:42:00
4 2013-07-30 01:12:00 01:12:00
5 2013-07-30 01:46:00 01:46:00
6 2013-07-30 03:22:00 03:22:00
7 2013-07-30 03:36:00 03:36:00

In [20]: df.dtypes
Out[20]: 
time      datetime64[ns]
delta    timedelta64[ns]
dtype: object

In [22]: df['delta'].apply(lambda x: x/np.timedelta64(1,'m'))
Out[22]: 
0    128
1    125
2    175
3    222
4     72
5    106
6    202
7    216
Name: delta, dtype: float64
于 2013-07-30T16:09:36.677 回答