问题
72 个子表,每个表都有一个年份索引和一个站点索引,定义如下:
CREATE TABLE climate.measurement_12_013
(
-- Inherited from table climate.measurement_12_013: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_12_013: station_id integer NOT NULL,
-- Inherited from table climate.measurement_12_013: taken date NOT NULL,
-- Inherited from table climate.measurement_12_013: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_12_013: category_id smallint NOT NULL,
-- Inherited from table climate.measurement_12_013: flag character varying(1) NOT NULL DEFAULT ' '::character varying,
CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7),
CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12)
)
INHERITS (climate.measurement)
CREATE INDEX measurement_12_013_s_idx
ON climate.measurement_12_013
USING btree
(station_id);
CREATE INDEX measurement_12_013_y_idx
ON climate.measurement_12_013
USING btree
(date_part('year'::text, taken));
(稍后添加外键约束。)
由于全表扫描,以下查询运行非常缓慢:
SELECT
count(1) AS measurements,
avg(m.amount) AS amount
FROM
climate.measurement m
WHERE
m.station_id IN (
SELECT
s.id
FROM
climate.station s,
climate.city c
WHERE
/* For one city... */
c.id = 5182 AND
/* Where stations are within an elevation range... */
s.elevation BETWEEN 0 AND 3000 AND
/* and within a specific radius... */
6371.009 * SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
(COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
) <= 50
) AND
/* Data before 1900 is shaky; insufficient after 2009. */
extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
/* Whittled down by category... */
m.category_id = 1 AND
/* Between the selected days and years... */
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end date wraps
into the next year. If it does, then add 1 to the current year.
*/
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
GROUP BY
extract( YEAR FROM m.taken )
迟缓来自查询的这一部分:
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end date wraps
into the next year. If it does, then add 1 to the current year.
*/
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
这部分查询匹配选定的日期。例如,如果用户想要查看有数据的所有年份的 6 月 1 日至 7 月 1 日之间的数据,则上述子句仅与那些日子匹配。如果用户想查看 12 月 22 日到 3 月 22 日之间的数据,同样对于所有有数据的年份,上述子句计算 3 月 22 日是在下一年的 12 月 22 日,因此相应地匹配日期:
目前日期固定为 1 月 1 日至 12 月 31 日,但将参数化,如上所示。
计划中的 HashAggregate 显示成本为 10006220141.11,我怀疑这是天文数字。
对正在执行的测量表(本身既没有数据也没有索引)进行全表扫描。该表从其子表中聚合了 2.73 亿行。
问题
索引日期以避免全表扫描的正确方法是什么?
我考虑过的选项:
- 杜松子酒
- 要旨
- 重写 WHERE 子句
- 将 year_taken、month_taken 和 day_taken 列与表分开
你觉得呢?你有没有什么想法?
谢谢!