postgresql - 更快地从大型 jsonb 字段中检索多个值（postgresql 9.4）

Question

tl;博士

使用 PSQL 9.4，有没有一种方法可以从 jsonb 字段中检索多个值，例如使用虚构函数：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

希望加快选择多个值所需的几乎线性时间（1 个值 = 300 毫秒，2 个值 = 450 毫秒，3 个值 = 600 毫秒）

背景

我有以下 jsonb 表：

CREATE TABLE "public"."analysis" (
  "date" date NOT NULL,
  "name" character varying (10) NOT NULL,
  "country" character (3) NOT NULL,
  "x" jsonb,
  PRIMARY KEY(date,name)
);

大约有 100 000 行，其中每行都有一个 jsonb 字典，其中包含 90 多个键和相应的值。我正在尝试编写一个 SQL 查询以相当快速的方式（< 500 ms）选择几个（< 10）键+值

索引和查询：190ms

我首先添加了一个索引：

CREATE INDEX ON analysis USING GIN (x);

这使得基于“x”字典中的值的查询快速，例如：

SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

这需要大约 190 毫秒（我们可以接受）

检索字典值

但是，一旦我开始在 SELECT 部分添加要返回的键，执行时间几乎呈线性增长：

1个值：300ms

select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

耗时 366 毫秒（+175 毫秒）

select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

耗时 300 毫秒（+110 毫秒）

3个值：600ms

select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

需要 600 毫秒（+410，或 +100，每个选定的值）

select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

需要 600 毫秒（+410，或 +100，每个选定的值）

更快地检索更多值

有没有办法从 jsonb 字段中检索多个值，例如使用虚构函数：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

这可能会加快这些查找速度。它可以将它们作为列或列表/数组甚至 json 对象返回。

使用 PL/Python 检索数组

只是为了它，我使用 PL/Python 制作了一个自定义函数，但这要慢得多（5s+），可能是由于 json.loads：

CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[])
RETURNS TEXT[] AS $$
  if not data:
    return []

  import simplejson as json
  j = json.loads(data) 

  l = []
  for i in k:
    l.append(j[i])

  return l

$$ LANGUAGE plpython2u;

# Usage:
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis  where date > '2014-01-01' and date < '2014-05-01'

2015-05-21 更新

我使用带有 GIN 索引的 hstore 重新实现了表，性能几乎与使用 jsonb 相同，即在我的情况下没有帮助。

score 0 · Accepted Answer

您正在使用#>operator，它看起来像是在执行路径搜索。您是否尝试过正常->查找？喜欢：

select  json_column->'json_field1'
,       json_column->'json_field2'

如果您使用临时表，看看会发生什么会很有趣。喜欢：

create temporary table tmp_doclist (doc jsonb)
;
insert  tmp_doclist
        (doc)
select  x
from    analysis
where   ... your conditions here ...
;
select  doc->'col1'
,       doc->'col2'
,       doc->'col3'
from    tmp_doclist
;

score 0 · Accepted Answer

如果没有数据，这很难测试。
创建自定义类型

create type my_query_result_type (
    a_dictionary_key float,
    a_second_dictionary_key float
)

而你的查询

select (json_populate_record(null::my_query_result_type,j::json)).*  from analysis;

您应该能够使用临时表而不是在运行时创建的类型，从而使您的查询动态化。
但首先检查一下这是否有助于形成性能观点。

postgresql - 更快地从大型 jsonb 字段中检索多个值（postgresql 9.4）

2 回答 2

Related

Reference