python - PyTables 问题 - 迭代表子集时的不同结果

Question

我是 PyTables 的新手，正在考虑使用它来处理从基于代理的建模模拟生成并存储在 HDF5 中的数据。我正在使用一个 39 MB 的测试文件，并且遇到了一些奇怪的问题。这是表的布局：

    /example/agt_coords (Table(2000000,)) ''
  description := {
  "agent": Int32Col(shape=(), dflt=0, pos=0),
  "x": Float64Col(shape=(), dflt=0.0, pos=1),
  "y": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (20000,)

这是我在 Python 中访问它的方式：

from tables import *
>>> h5file = openFile("alternate_hose_test.h5", "a")

h5file.root.example.agt_coords
/example/agt_coords (Table(2000000,)) ''
  description := {
  "agent": Int32Col(shape=(), dflt=0, pos=0),
  "x": Float64Col(shape=(), dflt=0.0, pos=1),
  "y": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (20000,)
>>> coords = h5file.root.example.agt_coords

现在事情变得奇怪了。

[x for x in coords[1:100] if x['agent'] == 1]
[(1, 25.0, 78.0), (1, 25.0, 78.0)]
>>> [x for x in coords if x['agent'] == 1]
[(1000000, 25.0, 78.0), (1000000, 25.0, 78.0)]
>>> [x for x in coords.iterrows() if x['agent'] == 1]
[(1000000, 25.0, 78.0), (1000000, 25.0, 78.0)]
>>> [x['agent'] for x in coords[1:100] if x['agent'] == 1]
[1, 1]
>>> [x['agent'] for x in coords if x['agent'] == 1]
[1, 1]

我不明白为什么当我遍历整个表时值会被搞砸，但当我取整组行的一小部分时却不会。我确定这是我使用该库的方式的错误，因此非常感谢您在此问题上的任何帮助。

score 7 · Accepted Answer

这是迭代Table对象时非常常见的混淆点，

当您迭代一个Table项目类型时，您获得的不是项目中的数据，而是当前行表的访问器。所以随着

[x for x in coords if x['agent'] == 1]

您创建一个行访问器列表，它们都指向表的“当前”行，即最后一行。但是当你这样做时

[x["agent"] for x in coords if x['agent'] == 1]

在构建列表时使用访问器。

通过在每次迭代中使用访问器来获取构建列表时所需的所有数据的解决方案。有两种选择

[x[:] for x in coords if x['agent'] == 1]

或者

[x.fetch_all_fields() for x in coords if x['agent'] == 1]

前者构建一个元组列表。后者返回一个 NumPy void 对象。IIRC，第二个更快，但前者可能对您更有意义。

这是 PyTables 开发人员的一个很好的解释。在未来的版本中，打印行访问器对象可能不仅仅是显示数据，而是声明它是一个行访问器对象。

python - PyTables 问题 - 迭代表子集时的不同结果

1 回答 1

Related

Reference