python - 在 Python 中将分层熊猫 DatetimeIndex 保存到 hdf5 时失去时区意识

Question

我在熊猫 0.14.1 上。假设我需要使用时区按分层索引中的两个时间戳对数据进行索引。将结果 DataFrame 保存到 hdf5 时，我似乎失去了时区意识：

import pandas as pd
dti1 = pd.DatetimeIndex(start=pd.Timestamp('20000101'), end=pd.Timestamp('20000102'), freq='D', tz='EST5EDT')
dti2 = pd.DatetimeIndex(start=pd.Timestamp('20000102'), end=pd.Timestamp('20000103'), freq='D', tz='EST5EDT')
mux = pd.MultiIndex.from_arrays([dti1, dti2])
df = pd.DataFrame(0, index=mux, columns=['a'])

这里df有时区：

                                                     a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00  0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00  0

保存并加载到 hdf5 后，时区信息似乎消失了：

df.to_hdf('/tmp/my.h5', 'data')
pd.read_hdf('/tmp/my.h5', 'data')

结果是：

                                         a
2000-01-01 05:00:00 2000-01-02 05:00:00  0
2000-01-02 05:00:00 2000-01-03 05:00:00  0

我想知道是否有一个好的解决方法，以及这是否是一个已知的错误。

score 4 · Accepted Answer

fixed使用多索引时，格式不支持此功能。我想可能应该提出我认为的未实施。这是一个跟踪此问题的问题

在此处查看完整的 hdf5 接口文档

In [11]: pd.read_hdf('/tmp/my.h5', 'data').index.levels[0]
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 05:00:00, 2000-01-02 05:00:00]
Length: 2, Freq: None, Timezone: None

但是，如果您指定table格式，它就可以工作。

In [13]: df.to_hdf('/tmp/my.h5', 'data2', format='table')

In [14]: pd.read_hdf('/tmp/my.h5', 'data2')
Out[14]: 
                                                     a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00  0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00  0

In [15]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[0]
Out[15]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00-05:00, 2000-01-02 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT

In [16]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[1]
Out[16]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-02 00:00:00-05:00, 2000-01-03 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT

python - 在 Python 中将分层熊猫 DatetimeIndex 保存到 hdf5 时失去时区意识

1 回答 1

Related

Reference