您应该使用read_csv
将 csv 读入 DataFrame:
In [1]: df = pd.read_csv(file_name, sep='\s+', header=None, names=['time', 'ip'])
In [2]: df
Out[2]:
time ip
0 06:03 65.55.215.62
1 06:04 157.56.92.152
2 06:04 66.249.74.175
3 06:05 173.199.116.171
Pandas(还)没有任何内置的时间对象,在 python 中这样做并不容易......您可以制作时间对象的时间列:
In [3]: df['time'] = df['time'].apply(lambda x: datetime.time(*map(int, x.split(':'))))
In [4]: df
Out[4]:
time ip
0 06:03:00 65.55.215.62
1 06:04:00 157.56.92.152
2 06:04:00 66.249.74.175
3 06:05:00 173.199.116.171
尤其是因为您不能对 datetime.time objects 进行算术运算。无论如何,我认为你会因为这里没有年/月/日而陷入困境,一方面,如何处理午夜?
所以让我们重新开始,假设你有一个约会时间......
In [5]: df = pd.read_csv(file_name, sep='\s+', header=None, names=['time', 'ip'])
In [6]: df['time'] = pd.to_datetime(df['time']) # let's use todays
In [7]: df
Out[7]:
time ip
0 2013-06-12 06:03:00 65.55.215.62
1 2013-06-12 06:04:00 157.56.92.152
2 2013-06-12 06:04:00 66.249.74.175
3 2013-06-12 06:05:00 173.199.116.171
然后您可以使用 a 找出差异shift
:
In [8]: df['time'].shift()
Out[8]:
0 NaT
1 2013-06-12 06:03:00
2 2013-06-12 06:04:00
3 2013-06-12 06:04:00
Name: time, dtype: datetime64[ns]
In [9]: d['time'] - df['time'].shift()
Out[9]:
0 NaT
1 00:01:00
2 00:00:00
3 00:01:00
Name: time, dtype: timedelta64[ns]
容易得多。:)