3

在我继续疯狂处理异国情调的熊猫/HDF5 问题时,我遇到了以下问题:

我有一系列非自然命名的列(nb:因为有充分的理由,负数是“系统”ID等),这通常不会产生问题:

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

但是,我的 select 语句确实超过了它:

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

有没有办法解决它?我可以将负值从“a-1”重命名为“a_1”,但这意味着重新加载系统中的所有数据。这是相当多的!:)

非常欢迎提出建议!

4

1 回答 1

3

这是一个测试表

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

有一种方法,但会将其构建到代码本身中。您可以按如下方式对列名进行变量替换。这是现有的例程(在master中)

   def select(self):
        """
        generate the selection
        """
        if self.condition is not None:
            return self.table.table.readWhere(self.condition.format(), start=self.start, stop=self.stop)
        elif self.coordinates is not None:
            return self.table.table.readCoordinates(self.coordinates)
        return self.table.table.read(start=self.start, stop=self.stop)

相反,如果你这样做

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

例如,通过x替换列引用,您可以获得数据。

这可以在检测到无效列名时完成,但非常棘手。

不幸的是,我建议重命名您的列。

于 2013-10-09T16:31:10.237 回答