我有一组日期。我想从它们的前向邻居中减去它们以获得它们之间的增量。我的代码如下所示:
import pandas, numpy, StringIO
txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
grouped = df.groupby('ID')
df['X_SEQUENCE_GAP'] = pandas.concat([g['DATE'].sub(g['DATE'].shift(), fill_value=0) for title,g in grouped])
我得到了相当难以理解的结果。所以,我会去,我有一个逻辑错误。
我得到的结果如下:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 12277 days, 00:00:00
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 13275 days, 00:00:00
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 13216 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 13799 days, 00:00:00
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 11354 days, 00:00:00
我期待 exapme 0 和 1 都会有 0 结果。非常感谢任何帮助。