1

我正在 Pandas 中创建一个数据框——</p>

df_data = dict()

for x in data:
    series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])

    df_data[x['_id']] = series

df = pandas.DataFrame(df_data)

data是格式中的 dicts 列表—</p>

{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
                            datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
                            datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
                            datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
            u'values': [204.0, 16.788, 139.2, 116.004]}}

打印一个示例系列给了我——</p>

>>> print df_data['770000000049']

>>> 2012-07-25 10:16:01.270000    204.000
2012-07-25 10:18:29.745000     16.788
2012-07-25 10:21:54.931000    139.200
2012-07-25 10:23:18.896000    116.004

正如预期的那样。然而,打印结果数据框给了我——</p>

>>> print df['770000000049']

>>> 1992-06-05 15:50:11.527680   NaN
2181-10-17 22:55:34.850625   NaN
2215-08-27 21:41:15.306049   NaN
1936-05-22 00:55:45.848401   NaN
1783-06-08 06:38:26.257076   NaN
2017-03-12 18:30:17.469108   NaN
2209-08-06 03:45:09.779652   NaN
1768-02-06 12:00:22.653272   NaN
1916-07-20 06:51:31.628376   NaN
2086-01-25 18:30:58.261336   NaN
1940-08-26 15:13:33.790568   NaN
1712-12-17 22:48:01.743241   NaN
1803-06-16 16:32:58.309017   NaN
1981-11-05 04:38:27.140059   NaN
2246-05-25 09:09:27.875035   NaN
...

哇!数据全是错误的。键和值都是完全错误的。

我究竟做错了什么?

编辑:打印df给了我-</p>

DatetimeIndex: 386 entries, 1992-06-05 15:50:11.527680 to 1774-08-13 02:00:15.237103
Data columns:
770000000006    0  non-null values
770000000009    0  non-null values
770000000010    0  non-null values
770000000011    0  non-null values
770000000012    0  non-null values
770000000013    0  non-null values
770000000018    0  non-null values
770000000020    0  non-null values
770000000021    0  non-null values
770000000022    0  non-null values
770000000024    0  non-null values
770000000029    0  non-null values
770000000030    0  non-null values
770000000032    0  non-null values
770000000034    0  non-null values
770000000049    0  non-null values
dtypes: float64(16)

完全错误

编辑 2

我已经编写了一个为我重现错误的模块。

4

2 回答 2

1

编辑:这一个错误。我(Wes)在这里修复了它:https ://github.com/pydata/pandas/commit/aea7c4522bd7beffd0df80efee818873110609fa


事实证明这不是一个错误——</p>

虽然 pandas 不会强制您使用已排序的日期索引,但如果日期未排序,其中一些方法可能会出现意外或不正确的行为。所以请小心。

在数据库级别对日期进行排序为我解决了这个问题。

于 2012-08-10T14:58:29.307 回答
0

我运行了您粘贴的代码段,对我来说似乎很好。你使用的是什么版本的熊猫/numpy?您可以发布所有/更多数据吗?

In [26]: paste
{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
                            datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
                            datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
                            datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
            u'values': [204.0, 16.788, 139.2, 116.004]}}
## -- End pasted text --
Out[26]: 
{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
   datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
   datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
   datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
  u'values': [204.0, 16.788, 139.2, 116.004]}}

In [27]: data = [_]

In [28]: paste
df_data = dict()

for x in data:
    series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])

    df_data[x['_id']] = series

df = pandas.DataFrame(df_data)
## -- End pasted text --

In [29]: print df['770000000049']
2012-07-25 10:16:01.270000    204.000
2012-07-25 10:18:29.745000     16.788
2012-07-25 10:21:54.931000    139.200
2012-07-25 10:23:18.896000    116.004
Name: 770000000049
于 2012-08-07T17:10:48.973 回答