postgresql - Postgres 分区修剪

Question

我在 Postgres 中有一张大桌子。

表名是bigtable，列是：

integer    |timestamp   |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN

我已经对 category_id 的模 10 和 capture_time 列的日期部分的表进行了分区。

分区表如下所示：

CREATE TABLE myschema.bigtable_d000h0(
    CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

CREATE TABLE myschema.bigtable_d000h1(
    CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

当我在 where 子句中使用 category_id 和 capture_time 运行查询时，分区没有按预期修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100;

"Result  (cost=0.00..9476.87 rows=1933 width=216)"
"  ->  Append  (cost=0.00..9476.87 rows=1933 width=216)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..1921.63 rows=1923 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h1 bigtable  (cost=0.00..776.93 rows=1 width=218)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h2 bigtable  (cost=0.00..974.47 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h3 bigtable  (cost=0.00..1351.92 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h4 bigtable  (cost=0.00..577.04 rows=1 width=217)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h5 bigtable  (cost=0.00..360.67 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h6 bigtable  (cost=0.00..1778.18 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h7 bigtable  (cost=0.00..315.82 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h8 bigtable  (cost=0.00..372.06 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h9 bigtable  (cost=0.00..1048.16 rows=1 width=215)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是，如果我在 where 子句中添加精确的模标准 ( category_id%10=0)，它会完美运行

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100 and category_id%10=0;

"Result  (cost=0.00..2154.09 rows=11 width=215)"
"  ->  Append  (cost=0.00..2154.09 rows=11 width=215)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..2154.09 rows=10 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有什么方法可以使分区修剪正常工作，而不必在每个查询中添加模条件？

score 4 · Accepted Answer

事情是：对于排除约束，PostgreSQL将创建一个隐式索引。在您的情况下，此索引将是部分索引，因为您在列上使用了表达式，而不仅仅是它的值。并在文档中说明（查找 11-2 示例）：

PostgreSQL 没有一个复杂的定理证明器可以识别以不同形式编写的数学等价表达式。（不仅这样的一般定理证明器极难创建，而且可能太慢而无法真正使用。）系统可以识别简单的不等式含义，例如“x < 1”意味着“x < 2”；否则谓词条件必须与查询的 WHERE 条件的一部分完全匹配，否则索引将不会被识别为可用。匹配发生在查询计划时，而不是运行时。

因此，您的结果 - 您应该具有与创建 CHECK 约束时使用的完全相同的表达式。

对于基于 HASH 的分区，我更喜欢 2 种方法：

添加一个可以采用一组有限值（在您的情况下为 10 个）的字段，最好是这样设计的；
指定哈希范围的方式与指定时间戳范围的方式相同：MINVALUE <= category_id < MAXVALUE

此外，还可以创建 2 级分区：

在第一级，您根据 category_id HASH 创建 10 个分区；
在第二级，您根据您的日期范围创建必要数量的分区。

虽然我总是尝试只使用 1 列进行分区，但更易于管理。

score 1 · Accepted Answer

对于任何有同样问题的人：我得出的结论是，最简单的方法是更改查询以包含模条件category_id%10=0

postgresql - Postgres 分区修剪

2 回答 2

Related

Reference