我在 PostgreSQL 数据库中有一个名为feeds_up
. 看起来像:
| feed_url | isup | hasproblems | observed timestamp with tz | id (pk)|
|----------|------|-------------|-------------------------------|--------|
| http://b.| t | f | 2013-02-27 16:34:46.327401+11 | 15235 |
| http://f.| f | t | 2013-02-27 16:31:25.415126+11 | 15236 |
它有大约 300k 行,每五分钟增长约 20 行。我有一个经常运行的查询(每个页面加载)
select distinct on (feed_url) feed_url, isUp, hasProblems
from feeds_up
where observed <= '2013-02-27T05:38:00.000Z'
order by feed_url, observed desc;
我在那里放了一个示例时间,那个时间是参数化的。解释分析在explain.depesz.com上。大约需要8s。疯狂的!
的唯一值只有大约 20 个feed_url
,所以这看起来效率很低。我以为我会很愚蠢,并在函数中尝试 FOR 循环。
CREATE OR REPLACE FUNCTION feedStatusAtDate(theTime timestamp with time zone) RETURNS SETOF feeds_up AS
$BODY$
DECLARE
url feeds_list%rowtype;
BEGIN
FOR url IN SELECT * FROM feeds_list
LOOP
RETURN QUERY SELECT * FROM feeds_up
WHERE observed <= theTime
AND feed_url = url.feed_url
ORDER BY observed DESC LIMIT 1;
END LOOP;
END;
$BODY$ language plpgsql;
select * from feedStatusAtDate('2013-02-27T05:38:00.000Z');
这只需要307 毫秒!
在 SQL 中使用 FOR 循环让我犯了错误,我怎样才能像第一个查询一样进行高效的查询?那可能吗?或者这是 FOR 循环真的是最好的那种事情?
预计到达时间
Postgres 版本:i686-pc-linux-gnu 上的 PostgreSQL 9.1.5,由 gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] 编译,32 位
feeds_up 上的索引:
CREATE INDEX feeds_up_url
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default");
CREATE INDEX feeds_up_url_observed
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default", observed DESC);
CREATE INDEX feeds_up_observed
ON public.feeds_up
USING btree
(observed DESC);