4

我正在使用 postgresql hstore 扩展并好奇数据是如何在内部存储的。请指出我可以在哪里查看 hstore 源代码以查看实现细节。

4

2 回答 2

9

hstore is part of the main PostgreSQL distribution, which is on http://git.postgresql.org/ and GitHub. Here is hstore in git head.

It looks like it's stored as a varlena, which means it's TOASTable like anything else. The downside is that the whole field needs to be read from disk - at least if it's compressed - to extract a key.

This also means that like any other normal field value, updating any part of the field requires that a new copy of the whole tuple (row) must be written to the table and the old one marked for expiry when it's no longer visible to any active transactions (see MVCC in the Pg manual). A big hstore is thus undesirable for data that will change frequently, since you'll need to rewrite the whole thing (and the row that contains it) whenever any part of it changes.

The sources don't seem to contain much in the way of comments to provide an overview of how hstore values are structured and stored, and it's a bit of a macro forest to take in quickly.

于 2012-10-30T06:42:02.440 回答
3

存储本身并不令人惊讶。

有趣的部分是如何对其进行索引以能够有效地回答诸如

从planet_osm_line中选择osm_id、名称、标签,其中'频率=> 16.7,铁路=>“铁路”'<@标签;

(这是来自一个真实的例子)含义:“查找(hstore)字段“包含”映射频率=> 16.7和铁路=>铁路的所有记录。

警告:这只是凭记忆。

这有两个组成部分:

首先是GiST 索引,它可以被看作是一种“草率的 B-Tree”,它有时不能准确地告诉你要采用哪个分支,但会给你一些分支。PostgreSQL 将其用于几何索引(例如,您可以在其中查询一个点是否在多边形中)。索引不会给你一个完美的命中,但可能会大大减少搜索空间。

其次,有一个“哈希”(对于您 Perlists)/“字典”(对于您 Pythonists)的编码以利用 GiST:您将哈希的每个键和每个键/值对散列到一个小的 int 中(详细信息是模糊,但让我们假设 0..255),取一个这个大小的位域,并为你得到的每个哈希值在你的位域中戳一个洞(我认为 Knuth 有一个很好的例子,索引卡上有开/闭孔他们的边缘和织针——是的,就在这里

那你只需要嫁给那两个。AFAIR Oleg Bartunov 和 Theodor Tsigaev 提出了这一点。我第一次看到的时候头都炸了。

于 2013-07-10T07:47:40.400 回答