python - PyTables 批量获取和更新

Question

我将每日库存数据作为使用 PyTables 创建的 HDF5 文件。我想获取一组行，将其作为数组处理，然后使用 PyTables 将其写回磁盘（更新行）。我想不出一个干净的方法来做到这一点。您能否让我知道实现这一目标的最佳方法是什么？

我的数据：

Symbol, date, price, var1, var2
abcd, 1, 2.5, 12, 12.5
abcd, 2, 2.6, 11, 10.2
abcd, 3, 2.45, 11, 10.3
defg, 1,12.34, 19.1, 18.1
defg, 2, 11.90, 19.5, 18.2
defg, 3, 11.75, 21, 20.9
defg, 4, 11.74, 22.2, 21.4

我想将与每个符号对应的行作为数组读取，进行一些处理并更新字段 var1 和 var2。我事先知道所有的符号，所以我可以遍历它们。我试过这样的事情：

rows_array = [row.fetch_all_fields() for row in table.where('Symbol == "abcd"')]

我想将 rows_array 传递给另一个函数，该函数将计算 var1 和 var2 的值并为每条记录更新它。请注意，var1、var2 就像移动平均线，所以我无法在迭代器内计算它们，因此需要将整个行集作为一个数组。

在我使用 rows_array 计算出我需要的任何内容后，我不确定如何将其写回数据，即使用新的计算值更新行。更新整个表时，我使用这个：

 table.cols.var1[:] = calc_something(rows_array)

但是，当我只想更新表的一部分时，我不是最好的方法。我想我可以重新运行“where”条件，然后根据我的计算更新每一行，但这似乎是在浪费时间重新扫描表格。

感谢您的建议...

谢谢，-e

score 10 · Accepted Answer

如果我理解得很好，下一个应该做你想做的事：

condition = 'Symbol == "abcd"'
indices = table.getWhereList(condition)  # get indices
rows_array = table[indices]  # get values
new_rows = compute(rows_array)   # compute new values
table[indices] = new_rows  # update the indices with new values

希望这可以帮助

python - PyTables 批量获取和更新

1 回答 1

Related

Reference