python - 熊猫重复的日期时间索引条目导致奇怪的异常

Question

让我们看下面的人为示例，我创建 aDataFrame然后DatetimeIndex使用具有重复条目的列创建 a。然后我将其DataFrame放入 aPanel中，然后尝试在主轴上进行迭代。

import pandas as pd
import datetime as dt

a = [1371215933513120, 1371215933513121, 1371215933513122, 1371215933513122]
b = [1,2,3,4]
df = pd.DataFrame({'a':a, 'b':b, 'c':[dt.datetime.fromtimestamp(t/1000000.) for t in a]})
df.index=pd.DatetimeIndex(df['c'])

d = OrderedDict()
d['x'] = df
p = pd.Panel(d)

for y in p.major_axis:
    print y
    print p.major_xs(y)

这导致以下输出：

2013-06-14 15:18:53.513120
                            x
a            1371215933513120
b                           1
c  2013-06-14 15:18:53.513120
2013-06-14 15:18:53.513121
                            x
a            1371215933513121
b                           2
c  2013-06-14 15:18:53.513121
2013-06-14 15:18:53.513122

其次是一个相当神秘的（对我来说）错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-045aaae5a074> in <module>()
     13 for y in p.major_axis:
     14     print y
---> 15     print p.major_xs(y)

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __str__(self)
    667         if py3compat.PY3:
    668             return self.__unicode__()
--> 669         return self.__bytes__()
    670 
    671     def __bytes__(self):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __bytes__(self)
    677         """
    678         encoding = com.get_option("display.encoding")
--> 679         return self.__unicode__().encode(encoding, 'replace')
    680 
    681     def __unicode__(self):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __unicode__(self)
    692             # This needs to compute the entire repr
    693             # so don't do it unless rownum is bounded
--> 694             fits_horizontal = self._repr_fits_horizontal_()
    695 
    696         if fits_vertical and fits_horizontal:

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in _repr_fits_horizontal_(self)
    652             d=d.iloc[:min(max_rows, height,len(d))]
    653 
--> 654         d.to_string(buf=buf)
    655         value = buf.getvalue()
    656         repr_width = max([len(l) for l in value.split('\n')])

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in to_string(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, nanRep, index_names, justify, force_unicode, line_width)
   1489                                            header=header, index=index,
   1490                                            line_width=line_width)
-> 1491         formatter.to_string()
   1492 
   1493         if buf is None:

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in to_string(self, force_unicode)
    312             text = info_line
    313         else:
--> 314             strcols = self._to_str_columns()
    315             if self.line_width is None:
    316                 text = adjoin(1, *strcols)

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _to_str_columns(self)
    265         for i, c in enumerate(self.columns):
    266             if self.header:
--> 267                 fmt_values = self._format_col(i)
    268                 cheader = str_columns[i]
    269 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _format_col(self, i)
    403                             float_format=self.float_format,
    404                             na_rep=self.na_rep,
--> 405                             space=self.col_space)
    406 
    407     def to_html(self, classes=None):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify)
   1319                         justify=justify)
   1320 
-> 1321     return fmt_obj.get_result()
   1322 
   1323 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in get_result(self)
   1335 
   1336     def get_result(self):
-> 1337         fmt_values = self._format_strings()
   1338         return _make_fixed_width(fmt_values, self.justify)
   1339 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _format_strings(self)
   1362 
   1363         print "vals:", vals
-> 1364         is_float = lib.map_infer(vals, com.is_float) & notnull(vals)
   1365         leading_space = is_float.any()
   1366 

ValueError: operands could not be broadcast together with shapes (2) (2,3)

现在，在解释了我正在创建一个包含重复条目的索引之后，错误的来源就很清楚了。然而，在不知道这一点的情况下，（同样，对于像我这样的新手）要弄清楚为什么会弹出这个异常会更加困难。

这导致我提出几个问题。

这真的是熊猫的预期行为吗？是否禁止创建具有重复条目的索引，或者只是禁止对其进行迭代？
如果禁止创建这样的索引，那么最初创建它时不应该引发异常吗？
如果迭代在某种程度上不正确，那么错误不应该提供更多信息吗？
难道我做错了什么？

python - 熊猫重复的日期时间索引条目导致奇怪的异常

0 回答 0

Related

Reference