2

我使用 H5TB API 将(非等距)时间序列存储为 hdf5 文件中的表。格式是这样的:

time   channel1   channel2
0.0    x          x
1.0    x          x
2.0    x          x

还有像这样插入“详细数据”:

time   channel1   channel2
0.0    x          x
1.0    x          x
1.2    x          x
1.4    x          x
1.6    x          x
1.8    x          x
2.0    x          x

现在我想以另一种数据格式存储数据,因此我喜欢像这样“查询”hdf5文件:

select ch1 where time > 1.6 && time < 3.0

我想到了几种方法来做这个查询:

  1. 有一个名为 B-Tree Index 的内置功能。是否可以使用它来索引数据?
  2. 我需要在时间通道上进行二进制搜索,然后读取通道值
  3. 我自己创建了一个索引(并在有细节插入时更新它)。在这里使用的最佳算法是什么?

索引的主要动机是快速查询响应。

你会在这里建议什么?

4

3 回答 3

2

I found another (obvious) solution finally by myself. The easiest way is to open the hdf5 file only read the time channel and create an in memory map before reading the data channels. This process could even be optimized by reading the time channel with a sparse hyperslab.

When the indexes at a particular time are known then the data could be read.

于 2011-03-28T09:14:41.247 回答
1

Assuming you're not asking about how to parse the data out of a hdf5 file, merely about how to use the data once parsed....

Given class channel_data { ... };, a std::map<double, channel_data> should suit your needs, specifically std::map<>::lower_bound() and std::map<>::upper_bound().

于 2011-03-21T07:10:55.690 回答
0

A popular approach to solving this problem appears to be using bitmap indexing. There are also papers written on doing this, but they do not appear to have published any code.

于 2012-02-10T00:20:54.780 回答