2

我只是遇到了一个奇怪的熊猫行为。说我同意:

import string
import random
m_size = (4,3)
num_mat = np.random.random_integers(0,10, m_size)
my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])]
mydf =  pd.DataFrame(num_mat, columns=['A', 'B', 'C'])

print mydf

   A   B   C
0  6   6   7
1  9  10   4
2  0  10   7
3  1   3  10

如果我现在这样做:

mydf.D = 4

我希望它会创建一个D填充值 4 的列,但 的条目没有mydf改变:

print mydf

   A   B   C
0  6   6   7
1  9  10   4
2  0  10   7
3  1   3  10

为什么?我没有收到任何警告或错误,所以做了mydf.D = 4什么?

这是最新的稳定版 pandas (0.11.0)

4

1 回答 1

4

尽管 pandas 允许您使用读取df.Col,但这显然只是 的简写df['Col'],并且该简写不适用于创建新列。你需要做mydf['D'] = 4

我觉得这很不幸,因为我经常尝试像你那样做。阴险的部分是它实际上创建了一个在数据框对象上调用的普通 Python 属性D。它实际上并没有作为列添加。因此,您必须确保删除该属性,否则即使您稍后正确添加它,它也会隐藏该列:

>>> d = pandas.DataFrame(np.random.randn(3, 2), columns=["A", "B"])
>>> d
          A         B
0 -0.931675  1.029137
1 -0.363033 -0.227672
2  0.058903 -0.362436
>>> d.Col = 8
>>> d.Col    # Attribute is there
8
>>> d['Col']    # But it is not a columns, just a simple attribute
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    d['Col']
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\frame.py", line 1906, in __getitem__
    return self._get_item_cache(key)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\generic.py", line 570, in _get_item_cache
    values = self._data.get(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1383, in get
    _, block = self._find_block(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1525, in _find_block
    self._check_have(item)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1532, in _check_have
    raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named Col'
>>> d['Col'] = 100    # Create a real column
>>> d.Col    # Attribute blocks access to column
8
>>> d['Col']    # Column is available via item access
0    100
1    100
2    100
Name: Col, dtype: int64
>>> del d.Col    # Delete the attribute
>>> d.Col     # Columns is now available as an attribute (!)
0    100
1    100
2    100
Name: Col, dtype: int64
>>> d['Col']    # And still as an item
5: 0    100
1    100
2    100
Name: Col, dtype: int64

看到d.Col“只有在你删除它之后才有效”可能有点令人惊讶——也就是说,在你删除它之后del d.Col,随后的阅读d.Col实际上会给你这个专栏。这只是因为 Python__getattr__的工作方式,但在这种情况下仍然有点不直观。

于 2013-05-06T20:07:12.817 回答