当我运行 2 个看似相似的查询时,我对 Redshift 正在做什么感到困惑。两者都不应该返回结果(查询不存在的配置文件)。具体来说:
SELECT * FROM profile WHERE id = 'id_that_doesnt_exist' and project_id = 1;
Execution time: 36.75s
相对
SELECT COUNT(*) FROM profile WHERE id = 'id_that_doesnt_exist' and project_id = 1;
Execution time: 0.2s
鉴于表是按project_id
那时排序的,id
我会认为这只是一个键查找。SELECT COUNT(*) ...
返回 0 的结果是 0.2 秒,这与我预期的差不多。SELECT * ...
返回 0 结果为 37.75 秒。对于相同的结果,这是一个巨大的差异,我不明白为什么?
如果它有助于架构如下:
CREATE TABLE profile (
project_id integer not null,
id varchar(256) not null,
created timestamp not null,
/* ... approx 50 other columns here */
)
DISTKEY(id)
SORTKEY(project_id, id);
解释自SELECT COUNT(*) ...
XN Aggregate (cost=435.70..435.70 rows=1 width=0)
-> XN Seq Scan on profile (cost=0.00..435.70 rows=1 width=0)
Filter: (((id)::text = 'id_that_doesnt_exist'::text) AND (project_id = 1))
解释自SELECT * ...
XN Seq Scan on profile (cost=0.00..435.70 rows=1 width=7356)
Filter: (((id)::text = 'id_that_doesnt_exist'::text) AND (project_id = 1))
为什么非计数要慢得多?Redshift 肯定知道该行不存在吗?