0

我正在使用时间戳数据集。我必须计算观察结果的连续差异(时间戳)。时间戳是datetime64[ns]类型。dfnew是熊猫数据框。

    dfnew['timestamp'] = dfnew['timestamp'].astype('datetime64[ns]')
    dfnew['dates']=dfnew['timestamp'].map(Timestamp.date)
    uniqueDates=list(set(dfnew['dates']))#unique values of date in a list
    #making a numpy array of timestamp for a particular date
    x = np.array(dfnew[dfnew['dates']==uniqueDates[0]]['timestamp'])
    y = np.ediff1d(x) #calculating consecutive difference of timestamp
    print max(y)
    49573580000000 nanoseconds
    print min(y)
    -86391523000000 nanoseconds

    print y[1:20]
    [ 92210000000 388030000000            0 211607000000 249337000000
      19283000000  91407000000 120180000000 240050000000  30406000000
                0 480337000000     13000000 491424000000            0
      80980000000 388103000000  88850000000 120333000000]
    dfnew['timestamp][0:20]
    0    2013-12-19 09:03:21.223000
    1    2013-12-19 11:34:23.037000
    2    2013-12-19 11:34:23.050000
    3    2013-12-19 11:34:23.067000
    4    2013-12-19 11:34:23.067000
    5    2013-12-19 11:34:23.067000
    6    2013-12-19 11:34:23.067000
    7    2013-12-19 11:34:23.067000
    8    2013-12-19 11:34:23.067000
    9    2013-12-19 11:34:23.080000
    10   2013-12-19 11:34:23.080000
    11   2013-12-19 11:34:23.080000
    12   2013-12-19 11:34:23.080000
    13   2013-12-19 11:34:23.080000
    14   2013-12-19 11:34:23.080000
    15   2013-12-19 11:34:23.097000
    16   2013-12-19 11:34:23.097000
    17   2013-12-19 11:34:23.097000
    18   2013-12-19 11:34:23.097000
    19   2013-12-19 11:34:23.097000
    Name: timestamp 

有什么办法可以让我得到输出hour而不是nanoseconds. 我可以使用普通除法转换它,但我正在寻找其他替代方法。此外,当我将其保存到 txt 文件中时,“纳秒”一词也在那里。我怎么能把这个单位从保存到 txt 文件中删除我只想保存数字。任何帮助表示赞赏

4

1 回答 1

2

尝试Series.diff()

import pandas as pd
import io

txt = """0    2013-12-19 09:03:21.223000
1    2013-12-19 11:34:23.037000
2    2013-12-19 11:34:23.050000
3    2013-12-19 11:34:23.067000
4    2013-12-19 11:34:23.067000
5    2013-12-19 11:34:23.067000
6    2013-12-19 11:34:23.067000
7    2013-12-19 11:34:23.067000
8    2013-12-19 11:34:23.067000
9    2013-12-19 11:34:23.080000
10   2013-12-19 11:34:23.080000
11   2013-12-19 11:34:23.080000
12   2013-12-19 11:34:23.080000
13   2013-12-19 11:34:23.080000
14   2013-12-19 11:34:23.080000
15   2013-12-19 11:34:23.097000
16   2013-12-19 11:34:23.097000
17   2013-12-19 11:34:23.097000
18   2013-12-19 11:34:23.097000
19   2013-12-19 11:34:23.097000
"""

s = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, parse_dates=[[1,2]], header=None, index_col=1, squeeze=True)

s.diff()

结果:

0                NaT
1    02:31:01.814000
2    00:00:00.013000
3    00:00:00.017000
4           00:00:00
5           00:00:00
6           00:00:00
7           00:00:00
8           00:00:00
9    00:00:00.013000
10          00:00:00
11          00:00:00
12          00:00:00
13          00:00:00
14          00:00:00
15   00:00:00.017000
16          00:00:00
17          00:00:00
18          00:00:00
19          00:00:00
Name: 1_2, dtype: timedelta64[ns]
于 2014-01-21T11:58:03.673 回答