2

我正在尝试将美国/东部时间的文件读入以时间为索引的数据帧。鉴于 2008 年 11 月 2 日是 DST 切换日,因此有两个小时 1(顺序指示哪个小时是白天与标准时间)。尝试本地化时,代码会失败,因为这些是模棱两可的。现在 pytz 模块有一种方法可以指示日期是否为 dst,这在此处很有用,但尚不清楚 pandas 是否公开了该日期。一种解决方案是为 read_csv 创建 date_parser 函数,但是有没有办法使用其他 pandas 函数来实现本地化的 DatetimeIndex?谢谢。

from pandas import read_csv, DatetimeIndex
from StringIO import StringIO

test = 'Time,Number\n\
11/02/2008 00:00, 1\n\
11/02/2008 01:00, 2\n\
11/02/2008 01:00, 3\n\
11/02/2008 02:00, 4\n\
11/02/2008 03:00, 5\n\
11/02/2008 04:00, 6\n'

df = read_csv(StringIO(test), parse_dates=[0]) #read in the csv
di = DatetimeIndex(df['Time']) # create a datetime index
di.tz_localize('US/Eastern') # try to localize to current timezone
File "/lib/python2.7/site-packages/pandas/tseries/index.py", line 1463, in tz_localize
new_dates = tslib.tz_localize_to_utc(self.asi8, tz)
File "tslib.pyx", line 1561, in pandas.tslib.tz_localize_to_utc (pandas/tslib.c:24350)
AmbiguousTimeError: 2008-11-02 01:00:00

所需的输出是:

<class 'pandas.tseries.index.DatetimeIndex'>
[2008-11-02 00:00:00, ..., 2008-11-02 04:00:00]
Length: 6, Freq: H, Timezone: US/Eastern
dr.values
array(['2008-11-02T00:00:00.000000000-0400',
   '2008-11-02T01:00:00.000000000-0400',
   '2008-11-02T01:00:00.000000000-0500',
   '2008-11-02T02:00:00.000000000-0500',
   '2008-11-02T03:00:00.000000000-0500',
   '2008-11-02T04:00:00.000000000-0500'], dtype='datetime64[ns]')
4

1 回答 1

5

尝试这个。该索引最初不在任何时区,所以需要说,嘿,你是'UTC',然后你可以正确本地化。

In [24]: x = pd.DatetimeIndex(df['Time']).tz_localize('UTC').tz_convert('US/Eastern')

In [25]: x
Out[25]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2008-11-01 20:00:00, ..., 2008-11-02 00:00:00]
Length: 6, Freq: None, Timezone: US/Eastern

In [26]: x.values
Out[26]: 
array(['2008-11-01T20:00:00.000000000-0400',
       '2008-11-01T21:00:00.000000000-0400',
       '2008-11-01T21:00:00.000000000-0400',
       '2008-11-01T22:00:00.000000000-0400',
       '2008-11-01T23:00:00.000000000-0400',
       '2008-11-02T00:00:00.000000000-0400'], dtype='datetime64[ns]')
于 2013-06-29T14:04:51.910 回答