5

DataFrames的尺寸在 100k 到 2m 之间。我正在处理的这个问题是这么大,但请注意,我必须对其他框架做同样的事情:

>>> len(data)
357451

现在这个文件是通过编译许多文件创建的,所以它的索引真的很奇怪。所以我想做的就是用 重新索引它range(len(data)),但我收到了这个错误:

>>> data.reindex(index=range(len(data)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2542, in reindex
    fill_value, limit)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2618, in _reindex_index
    limit=limit)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 893, in reindex
    limit=limit)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 812, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

这实际上没有任何意义。由于我使用包含数字 0 到 357450 的数组重新索引,因此所有 Index 对象都是唯一的!为什么会返回此错误?

额外信息:我正在使用 python2.7 和 pandas 11.0

4

1 回答 1

8

当它抱怨时Reindexing only valid with uniquely valued Index,并不是反对您的索引不是唯一的,而是反对您的旧索引不是唯一的。

例如:

>>> df = pd.DataFrame(range(5), index = [1,2,3,1,2])
>>> df
   0
1  0
2  1
3  2
1  3
2  4
>>> df.reindex(index=range(len(df)))
Traceback (most recent call last):
[...]
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.12.0.dev_0bd5e77-py2.7-linux-i686.egg/pandas/core/index.py", line 849, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

>>> df.index = range(len(df))
>>> df
   0
0  0
1  1
2  2
3  3
4  4

虽然我想我会写

df.reset_index(drop=True)

反而。

于 2013-05-01T22:24:06.910 回答