我正在运行一个从单个表中检索数据的简单查询。如果我只查找表中的非 JSON 字段,则查询需要 16 毫秒。如果我包含引用 JSONB 数据中的字段的字段,那么它会增加 62 倍。如果我查找两个不同的 JSONB 字段,则将其翻倍。
--EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::json->'summary'->'begin_milage' as begin_milage,
segment_data::json->'summary'->'end_milage' as end_milage
FROM
segments_table
WHERE
vehicle=12 AND trip=3
ORDER BY
begin_time;
SELECT 子句中包含两个 JSON 字段,查询需要 2.0 秒。如果省略一个需要 1.0 秒,如果省略两个 JSON 字段,则查询只需要 16 毫秒。
该表本身有大约 700 条记录。查询返回 83 条记录。运行不同的查询我注意到在查询 2 个 JSON 字段时,返回的记录越多,完成查询所需的时间就越长(大约 0.0066 * X 1.32毫秒)。
我尝试为车辆和行程查找添加索引,但这并没有太大区别(正如预期的那样)。看起来这是对数据的实际检索,在 JSONB 字段中查找数据需要时间。现在,如果 WHERE 子句中需要 JSON 字段,那么看到这种降级会更容易理解,但事实并非如此。
一个简单的解决方案当然是将每个字段从 JSON blob 中拉出,并在表本身中为此创建单独的字段。但在我走这条路之前,还有什么可以解决这个性能问题的吗?
以下是分析的结果:
Sort (cost=13.25..13.27 rows=10 width=28) (actual time=1999.899..1999.901 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=5663
-> Bitmap Heap Scan on segments_table (cost=4.38..13.08 rows=10 width=28) (actual time=1.332..1999.730 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5663
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.368 ms
Execution time: 2000.000 ms
另一个有趣的观察是多次运行相同的查询,我发现缓存没有任何改进,我希望在随后的相同查询中发生这种情况。
我对股票 postgres 服务器配置所做的唯一修改是将shared_buffers
128MB 增加到 256MB 并设置effective_cache_size = 1GB
. 我也max_connections
从 100 减少到 20。
以上结果是在8核i7处理器的Win7下运行的。还在双核 CPU 上的 Ubuntu 下进行了相同的测试,查询时间大致相同:2.2 秒(当在 SELECT 子句中包含两个 JSONB 字段时)。
更新:
SELECT 子句中的单个 JSON 字段:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::json->'summary'->'end_mileage' as end_mileage
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
结果:
Sort (cost=13.15..13.17 rows=10 width=28) (actual time=999.695..999.696 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=2834
-> Bitmap Heap Scan on segments_table (cost=4.38..12.98 rows=10 width=28) (actual time=0.781..999.554 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=2834
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.353 ms
Execution time: 999.777 ms
SELECT 子句中没有 JSON 字段:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
结果:
Sort (cost=13.05..13.07 rows=10 width=10) (actual time=0.194..0.205 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 19kB
Buffers: shared hit=5
-> Bitmap Heap Scan on segments_table (cost=4.38..12.88 rows=10 width=10) (actual time=0.088..0.122 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.048..0.048 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.590 ms
Execution time: 0.280 ms
表定义:
CREATE TABLE public.segments_table
(
segment_id integer NOT NULL DEFAULT nextval('segments_table_segment_id_seq'::regclass),
vehicle smallint NOT NULL,
trip smallint NOT NULL,
segment smallint NOT NULL,
begin_time timestamp without time zone NOT NULL,
segment_data jsonb,
CONSTRAINT segments_table_pkey PRIMARY KEY (segment_id),
CONSTRAINT segments_table_vehicle_64df5bc5_uniq UNIQUE (vehicle, trip, segment, begin_time)
)
WITH (
OIDS=FALSE
);
CREATE INDEX segments
ON public.segments_table
USING btree
(segment);
CREATE INDEX vehicles
ON public.segments_table
USING btree
(vehicle);
CREATE INDEX trips
ON public.segments_table
USING btree
(trip);
更新#2:
正如@Mark_M 指出的那样修复转换问题,更改json
为 jsonb` 将查询时间从 2 秒减少到 300 毫秒:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::jsonb->'summary'->'begin_mileage' as begin_mileage,
segment_data::jsonb->'summary'->'end_mileage' as end_mileage
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
Sort (cost=13.15..13.17 rows=10 width=28) (actual time=296.339..296.342 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=5663
-> Bitmap Heap Scan on segments_table (cost=4.38..12.98 rows=10 width=28) (actual time=0.275..296.229 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5663
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.045..0.045 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.352 ms
Execution time: 296.473 ms
这改进了很多,但仅使用非 JSON 字段查找仍然是 18 倍,但这要好得多。这是使用 JSONB 字段的合理性能损失吗?