我有以下查询
SELECT translation.id
FROM "TRANSLATION" translation
INNER JOIN "UNIT" unit
ON translation.fk_id_unit = unit.id
INNER JOIN "DOCUMENT" document
ON unit.fk_id_document = document.id
WHERE document.fk_id_job = 3665
ORDER BY translation.id asc
LIMIT 50
它运行了可怕的110 秒。
桌子尺寸:
+----------------+-------------+
| Table | Records |
+----------------+-------------+
| TRANSLATION | 6,906,679 |
| UNIT | 6,906,679 |
| DOCUMENT | 42,321 |
+----------------+-------------+
但是,当我将LIMIT
参数从 50 更改为 1000 时,查询会在2 seconds 内完成。
这是慢的查询计划
Limit (cost=0.00..146071.52 rows=50 width=8) (actual time=111916.180..111917.626 rows=50 loops=1)
-> Nested Loop (cost=0.00..50748166.14 rows=17371 width=8) (actual time=111916.179..111917.624 rows=50 loops=1)
Join Filter: (unit.fk_id_document = document.id)
-> Nested Loop (cost=0.00..39720545.91 rows=5655119 width=16) (actual time=0.051..15292.943 rows=5624514 loops=1)
-> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" translation (cost=0.00..7052806.78 rows=5655119 width=16) (actual time=0.039..1887.757 rows=5624514 loops=1)
-> Index Scan using "UNIT_pkey" on "UNIT" unit (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=5624514)
Index Cond: (unit.id = translation.fk_id_translation_unit)
-> Materialize (cost=0.00..138.51 rows=130 width=8) (actual time=0.000..0.006 rows=119 loops=5624514)
-> Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.025..0.184 rows=119 loops=1)
Index Cond: (fk_id_job = 3665)
对于快速的
Limit (cost=523198.17..523200.67 rows=1000 width=8) (actual time=2274.830..2274.988 rows=1000 loops=1)
-> Sort (cost=523198.17..523241.60 rows=17371 width=8) (actual time=2274.829..2274.895 rows=1000 loops=1)
Sort Key: translation.id
Sort Method: top-N heapsort Memory: 95kB
-> Nested Loop (cost=139.48..522245.74 rows=17371 width=8) (actual time=0.095..2252.710 rows=97915 loops=1)
-> Hash Join (cost=139.48..420861.93 rows=17551 width=8) (actual time=0.079..2005.238 rows=97915 loops=1)
Hash Cond: (unit.fk_id_document = document.id)
-> Seq Scan on "UNIT" unit (cost=0.00..399120.41 rows=5713741 width=16) (actual time=0.008..1200.547 rows=6908070 loops=1)
-> Hash (cost=137.86..137.86 rows=130 width=8) (actual time=0.065..0.065 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
-> Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.009..0.041 rows=119 loops=1)
Index Cond: (fk_id_job = 3665)
-> Index Scan using "TRANSLATION_idx_unit" on "TRANSLATION" translation (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=97915)
Index Cond: (translation.fk_id_translation_unit = unit.id)
显然执行计划非常不同,第二个导致查询快 50 倍。
我对查询中涉及的所有字段都有索引,并且ANALYZE
在运行查询之前我在所有表上运行。
有人可以看到第一个查询有什么问题吗?
更新:表定义
CREATE TABLE "public"."TRANSLATION" (
"id" BIGINT NOT NULL,
"fk_id_translation_unit" BIGINT NOT NULL,
"translation" TEXT NOT NULL,
"fk_id_language" INTEGER NOT NULL,
"relevance" INTEGER,
CONSTRAINT "TRANSLATION_pkey" PRIMARY KEY("id"),
CONSTRAINT "TRANSLATION_fk" FOREIGN KEY ("fk_id_translation_unit")
REFERENCES "public"."UNIT"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
DEFERRABLE
INITIALLY DEFERRED,
CONSTRAINT "TRANSLATION_fk1" FOREIGN KEY ("fk_id_language")
REFERENCES "public"."LANGUAGE"("id")
ON DELETE NO ACTION
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "TRANSLATION_idx_unit" ON "public"."TRANSLATION"
USING btree ("fk_id_translation_unit");
CREATE INDEX "TRANSLATION_language_idx" ON "public"."TRANSLATION"
USING hash ("translation");
CREATE TABLE "public"."UNIT" (
"id" BIGINT NOT NULL,
"text" TEXT NOT NULL,
"fk_id_language" INTEGER NOT NULL,
"fk_id_document" BIGINT NOT NULL,
"word_count" INTEGER DEFAULT 0,
CONSTRAINT "UNIT_pkey" PRIMARY KEY("id"),
CONSTRAINT "UNIT_fk" FOREIGN KEY ("fk_id_document")
REFERENCES "public"."DOCUMENT"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
NOT DEFERRABLE,
CONSTRAINT "UNIT_fk1" FOREIGN KEY ("fk_id_language")
REFERENCES "public"."LANGUAGE"("id")
ON DELETE NO ACTION
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "UNIT_idx_document" ON "public"."UNIT"
USING btree ("fk_id_document");
CREATE INDEX "UNIT_text_idx" ON "public"."UNIT"
USING hash ("text");
CREATE TABLE "public"."DOCUMENT" (
"id" BIGINT NOT NULL,
"fk_id_job" BIGINT,
CONSTRAINT "DOCUMENT_pkey" PRIMARY KEY("id"),
CONSTRAINT "DOCUMENT_fk" FOREIGN KEY ("fk_id_job")
REFERENCES "public"."JOB"("id")
ON DELETE SET NULL
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITHOUT OIDS;
更新:数据库参数
shared_buffers = 2048MB
effective_cache_size = 4096MB
work_mem = 32MB
Total memory: 32GB
CPU: Intel Xeon X3470 @ 2.93 GHz, 8MB cache