1

Hello Oracles of StackOverflow,

First time I managed to ask a question on stack overflow, so feel free to throw your cabbages at me. (or correct the way I should be asking my question)

I have this problem. I'm using HDF5 to store massive quantities of cookie information.

My Data is structured in the following way:

CookieID -> Event -> Key_value Pair

There are multiple events for each cookieID. But only one key_value pair per event.

I'd like to know what the best way I should store this in a HDF5.

Currently, I'm storing each cookie as a seperate table within a group in the HDF5, using the cookieID as the name of the table. Unfortunately for me, with 10,000,000 cookies, HDF5 (or specifically PyTables) doesn't approve of this type of storage.

Specifically throwing this error:

/CookieData`` is exceeding the recommended maximum number of children (16384)

I'm wondering if you could recommend the best way of storing this information.

Should I create a flat table? Should I keep this method? Is there something else I can do?

Help is appreciated. Thanks for reading.

4

1 回答 1

2

经过几个小时的研究,我发现我试图做的事情绝对是不可能的。

以下链接详细说明了将 HDF5 与可变长度嵌套子项一起使用的可能性。

我决定暂时使用平面文件,并希望这比数据库存储更有效。最后一个平面文件的问题是我必须复制文件中的值,否则这些值不应该存在。

如果其他人有任何更好的想法,将不胜感激。

于 2012-08-07T13:20:37.943 回答