dictionary - Python Pandas 系列在传递具有大索引值的 dict 时提供 NaN 数据

Question

我正在尝试通过将包含索引和数据对的字典传递给 Pandas 系列来构建它。这样做时，我注意到一个有趣的怪癖。如果数据对的索引是一个非常大的整数，则数据将显示为 NaN。这可以通过减小索引值的大小或使用两个列表而不是单个 dict 创建系列来解决。我有很大的索引值，因为我使用的是自 1970 年以来微秒格式的时间戳。我做错了什么还是这是一个错误？

这是一个例子：

import pandas as pd

test_series_time = [1357230060000000, 1357230180000000, 1357230300000000]
test_series_value = [1, 2, 3]
series = pd.Series(test_series_value, test_series_time, name="this works")

test_series_dict = {1357230060000000: 1, 1357230180000000: 2, 1357230300000000: 3}
series2 = pd.Series(test_series_dict, name="this doesn't")

test_series_dict_smaller_index = {1357230060: 1, 1357230180: 2, 1357230300: 3}
series3 = pd.Series(test_series_dict_smaller_index, name="this does")

print series
print series2
print series3

和输出：

1357230060000000    1
1357230180000000    2
1357230300000000    3
Name: this works

1357230060000000   NaN
1357230180000000   NaN
1357230300000000   NaN
Name: this doesn't

1357230060    1
1357230180    2
1357230300    3
Name: this does

那么这是怎么回事？

score 0 · Accepted Answer

我打赌你是 32 位的；在 64 位上，这工作正常。在 0.10.1 中，通过 dicts 创建的默认设置是使用默认的 numpy 整数创建，这取决于系统（例如 int32 在 32 位上，int64 在 64 位上）。您正在溢出 dtype，这会导致不可预测的行为。

在 0.11（本周发布！）中，这将起作用，因为无论系统如何，它都会默认创建 int64。

In [12]: np.iinfo(np.int32).max
Out[12]: 2147483647

In [13]: np.iinfo(np.int64).max
Out[13]: 9223372036854775807

将您的微秒转换为时间戳（乘以 1000 以输入纳秒，这是 Timestamp 接受的整数输入，那么您就可以开始了

In [5]: pd.Series(test_series_value, 
        [ pd.Timestamp(k*1000) for k in test_series_time ])
Out[5]: 
2013-01-03 16:21:00    1
2013-01-03 16:23:00    2
2013-01-03 16:25:00    3

dictionary - Python Pandas 系列在传递具有大索引值的 dict 时提供 NaN 数据

1 回答 1

Related

Reference