我有一张users
带有字段id
和email
. id
是主键并且email
也被索引。
database> \d users
+-----------------------------+-----------------------------+-----------------------------------------------------+
| Column | Type | Modifiers |
|-----------------------------+-----------------------------+-----------------------------------------------------|
| id | integer | not null default nextval('users_id_seq'::regclass) |
| email | character varying | |
+-----------------------------+-----------------------------+-----------------------------------------------------+
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_on_email" UNIQUE, btree (email)
如果我distinct on (email)
在子查询中使用子句查询表,我会受到显着的性能损失。
database> explain (analyze, buffers)
select
id
from (
select distinct on (email)
id
from
users
) as t
where id = 123
+-----------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-----------------------------------------------------------------------------------------------------------------------------|
| Subquery Scan on t (cost=8898.69..10077.84 rows=337 width=4) (actual time=221.133..250.782 rows=1 loops=1) |
| Filter: (t.id = 123) |
| Rows Removed by Filter: 67379 |
| Buffers: shared hit=2824, temp read=288 written=289 |
| -> Unique (cost=8898.69..9235.59 rows=67380 width=24) (actual time=221.121..247.582 rows=67380 loops=1) |
| Buffers: shared hit=2824, temp read=288 written=289 |
| -> Sort (cost=8898.69..9067.14 rows=67380 width=24) (actual time=221.120..239.573 rows=67380 loops=1) |
| Sort Key: users.email |
| Sort Method: external merge Disk: 2304kB |
| Buffers: shared hit=2824, temp read=288 written=289 |
| -> Seq Scan on users (cost=0.00..3494.80 rows=67380 width=24) (actual time=0.009..9.714 rows=67380 loops=1) |
| Buffers: shared hit=2821 |
| Planning Time: 0.243 ms |
| Execution Time: 251.258 ms |
+-----------------------------------------------------------------------------------------------------------------------------+
将其与distinct on (id)
成本小于上一个查询的千分之一进行比较。
database> explain (analyze, buffers)
select
id
from (
select distinct on (id)
id
from
users
) as t
where id = 123
+-----------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-----------------------------------------------------------------------------------------------------------------------------|
| Unique (cost=0.29..8.31 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1) |
| Buffers: shared hit=3 |
| -> Index Only Scan using users_pkey on users (cost=0.29..8.31 rows=1 width=4) (actual time=0.020..0.020 rows=1 loops=1) |
| Index Cond: (id = 123) |
| Heap Fetches: 1 |
| Buffers: shared hit=3 |
| Planning Time: 0.090 ms |
| Execution Time: 0.034 ms |
+-----------------------------------------------------------------------------------------------------------------------------+
为什么是这样?
我遇到的真正问题是我正在尝试创建一个视图,该视图执行distinct on
一个不唯一的索引列并且性能非常糟糕。