0

我正在尝试创建一个新表,它是具有匹配主键的 6 个其他表的聚合总和。如果我使用超过 3 个输入表,这会一直停滞不前:

 CREATE TABLE table_name AS SELECT table1.timestamp, table1.value + table2.value + table3.value + table4.value AS value FROM table1, table2, table3, table4 WHERE table1.timestamp=table2.timestamp AND table2.timestamp=table3.timestamp AND table3.timestamp=table4.timestamp;

问题:脚本在运行 2-3 个表时运行得相当快(<5 秒),但在其他情况下会停止。我没有尝试过运行超过 5 分钟,但无论如何这对于我的目的来说太慢了。

表格说明:每个表格的格式相同,共有 6 列(其中 2 列是相关的)。主键是整数“时间戳”,“值”是实数。表格大小各不相同,但每个表格大约有 100k 行/条目。这些表大多具有相同的主键,但每个表中都缺少一些数据点,因此从新表中省略这些数据点至关重要。

有什么我做错了,我应该怎么做才能让它跑得快?

编辑:

Ps:这是一个完整的“EXPLAIN ANALYZE”查询的实际输出:

eldb=# EXPLAIN ANALYZE CREATE TABLE test_table AS SELECT count1.timestamp, count
1.year, count1.month, count1.day, count1.period, count1.the_value + count2.the_value + count
3.the_value + count4.the_value + count5.the_value + count6.the_value AS the_value FROM "table_name-1" AS count
1, "table_name-2" AS count2, "table_name-3" AS count3, "table_name-4" AS count4,
 "table_name-5" AS count5, "table_name-6" AS count6 WHERE count1.timestamp=count
2.timestamp AND count2.timestamp=count3.timestamp AND count3.timestamp=count4.ti
mestamp AND count4.timestamp=count5.timestamp AND count5.timestamp=count6.timest
amp AND count1.timestamp>2012020000 AND count2.timestamp>2012020000 AND count3.t
imestamp>2012020000 AND count4.timestamp>2012020000 and count5.timestamp>2012020
000 AND count6.timestamp>2012020000;
                                                                          QUERY
PLAN
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
 Merge Join  (cost=20323.61..153806457715456.50 rows=5592655588099248 width=44)
(actual time=84.524..3310.692 rows=3410 loops=1)
   Merge Cond: (count1."timestamp" = count4."timestamp")
   ->  Nested Loop  (cost=10161.80..4417379579.26 rows=1057606343 width=40) (act
ual time=44.597..1616.585 rows=3410 loops=1)
         Join Filter: (count2."timestamp" = count1."timestamp")
         ->  Merge Join  (cost=10161.80..101480.96 rows=6070522 width=16) (actua
l time=43.648..48.950 rows=3410 loops=1)
               Merge Cond: (count2."timestamp" = count3."timestamp")
               ->  Sort  (cost=5080.90..5168.01 rows=34844 width=8) (actual time
=25.608..25.804 rows=3410 loops=1)
                     Sort Key: count2."timestamp"
                     Sort Method: quicksort  Memory: 256kB
                     ->  Seq Scan on "table_name-2" count2  (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.064..23.297 rows=3410 loops=1)
                           Filter: ("timestamp" > 2012020000)
               ->  Materialize  (cost=5080.90..5255.12 rows=34844 width=8) (actu
al time=18.030..19.847 rows=3410 loops=1)
                     ->  Sort  (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.023..18.416 rows=3410 loops=1)
                           Sort Key: count3."timestamp"
                           Sort Method: quicksort  Memory: 256kB
                           ->  Seq Scan on "table_name-3" count3  (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.023..16.294 rows=3410 loops=1)
                                 Filter: ("timestamp" > 2012020000)
         ->  Materialize  (cost=0.00..2351.88 rows=34844 width=24) (actual time=
0.000..0.147 rows=3410 loops=3410)
               ->  Seq Scan on "table_name-1" count1  (cost=0.00..1972.66 rows=3
4844 width=24) (actual time=0.020..16.853 rows=3410 loops=1)
                     Filter: ("timestamp" > 2012020000)
   ->  Materialize  (cost=10161.80..4007228099.11 rows=1057606343 width=24) (act
ual time=39.917..1687.402 rows=3410 loops=1)
         ->  Nested Loop  (cost=10161.80..4004584083.26 rows=1057606343 width=24
) (actual time=39.915..1685.956 rows=3410 loops=1)
               Join Filter: (count4."timestamp" = count6."timestamp")
               ->  Merge Join  (cost=10161.80..101480.96 rows=6070522 width=16)
(actual time=38.689..44.309 rows=3410 loops=1)
                     Merge Cond: (count4."timestamp" = count5."timestamp")
                     ->  Sort  (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.960..19.156 rows=3410 loops=1)
                           Sort Key: count4."timestamp"
                           Sort Method: quicksort  Memory: 256kB
                           ->  Seq Scan on "table_name-4" count4  (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.059..17.271 rows=3410 loops=1)
                                 Filter: ("timestamp" > 2012020000)
                     ->  Materialize  (cost=5080.90..5255.12 rows=34844 width=8)
 (actual time=19.717..21.826 rows=3410 loops=1)
                           ->  Sort  (cost=5080.90..5168.01 rows=34844 width=8)
(actual time=19.708..20.266 rows=3410 loops=1)
                                 Sort Key: count5."timestamp"
                                 Sort Method: quicksort  Memory: 256kB
                                 ->  Seq Scan on "table_name-5" count5  (cost=0.
00..1972.66 rows=34844 width=8) (actual time=0.034..18.001 rows=3410 loops=1)
                                       Filter: ("timestamp" > 2012020000)
               ->  Materialize  (cost=0.00..2283.88 rows=34844 width=8) (actual
time=0.000..0.148 rows=3410 loops=3410)
                     ->  Seq Scan on "table_name-6" count6  (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.036..17.785 rows=3410 loops=1)
                           Filter: ("timestamp" > 2012020000)
 Total runtime: 3330.933 ms
(40 rows)

这是表结构(所有表都相同):

CREATE TABLE "table_name-6"
(
"timestamp" integer NOT NULL,
year integer NOT NULL,
month integer NOT NULL,
day integer NOT NULL,
period integer NOT NULL,
the_value real,
CONSTRAINT "table_name-6_pkey" PRIMARY KEY ("timestamp" )
)

注意:实际的表名和值已重命名。此外,此输出仅占实际表大小的一小部分。

4

1 回答 1

3
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;

set search_path='tmp';

SET random_page_cost=1;

CREATE TABLE table_name1
        ( ztimestamp integer NOT NULL
        , year integer NOT NULL
        , month integer NOT NULL
        , day integer NOT NULL
        , period integer NOT NULL
        , the_value real
        , CONSTRAINT table_name1_pkey PRIMARY KEY (ztimestamp )
        ) ;

CREATE TABLE table_name2
        ( ztimestamp integer NOT NULL
        , year integer NOT NULL
        , month integer NOT NULL
        , day integer NOT NULL
        , period integer NOT NULL
        , the_value real
        , CONSTRAINT table_name2_pkey PRIMARY KEY (ztimestamp )
        ) ;


... similar for 3,4,5,6 ...


INSERT INTO table_name1(ztimestamp,year,month,day,period,the_value)
SELECT generate_series(1,2000), 0,0,0,0, 1.0;
INSERT INTO table_name2 SELECT * FROM table_name1;
INSERT INTO table_name3 SELECT * FROM table_name1;
INSERT INTO table_name4 SELECT * FROM table_name1;
INSERT INTO table_name5 SELECT * FROM table_name1;
INSERT INTO table_name6 SELECT * FROM table_name1;

EXPLAIN ANALYZE
CREATE TABLE test_table AS
SELECT c1.ztimestamp, c1.year, c1.month, c1.day, c1.period
        , c1.the_value + c2.the_value + c3.the_value + c4.the_value
        + c5.the_value + c6.the_value AS the_value
FROM table_name1 AS c1
        , table_name2 AS c2
        , table_name3 AS c3
        , table_name4 AS c4
        , table_name5 AS c5
        , table_name6 AS c6
WHERE c1.ztimestamp=c2.ztimestamp
AND c2.ztimestamp=c3.ztimestamp
AND c3.ztimestamp=c4.ztimestamp
AND c4.ztimestamp=c5.ztimestamp
AND c5.ztimestamp=c6.ztimestamp
    ;

结果&&计划:INSERT 0 2000

INSERT 0 2000
INSERT 0 2000
INSERT 0 2000
INSERT 0 2000
INSERT 0 2000
INSERT 0 2000
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..475.93 rows=1963 width=44) (actual time=0.066..11.840 rows=2000 loops=1)
   Merge Cond: (c1.ztimestamp = c6.ztimestamp)
   ->  Merge Join  (cost=0.00..371.26 rows=1963 width=56) (actual time=0.052..8.706 rows=2000 loops=1)
         Merge Cond: (c1.ztimestamp = c5.ztimestamp)
         ->  Merge Join  (cost=0.00..291.12 rows=1963 width=48) (actual time=0.042..6.752 rows=2000 loops=1)
               Merge Cond: (c1.ztimestamp = c4.ztimestamp)
               ->  Merge Join  (cost=0.00..210.98 rows=1963 width=40) (actual time=0.033..4.751 rows=2000 loops=1)
                     Merge Cond: (c1.ztimestamp = c3.ztimestamp)
                     ->  Merge Join  (cost=0.00..130.84 rows=1963 width=32) (actual time=0.022..2.903 rows=2000 loops=1)
                           Merge Cond: (c1.ztimestamp = c2.ztimestamp)
                           ->  Index Scan using table_name1_pkey on table_name1 c1  (cost=0.00..50.70 rows=1963 width=24) (actual time=0.009..0.609 rows=2000 loops=1)
                           ->  Index Scan using table_name2_pkey on table_name2 c2  (cost=0.00..50.70 rows=1963 width=8) (actual time=0.010..0.756 rows=2000 loops=1)
                     ->  Index Scan using table_name3_pkey on table_name3 c3  (cost=0.00..50.70 rows=1963 width=8) (actual time=0.010..0.718 rows=2000 loops=1)
               ->  Index Scan using table_name4_pkey on table_name4 c4  (cost=0.00..50.70 rows=1963 width=8) (actual time=0.009..0.758 rows=2000 loops=1)
         ->  Index Scan using table_name5_pkey on table_name5 c5  (cost=0.00..50.70 rows=1963 width=8) (actual time=0.010..0.696 rows=2000 loops=1)
   ->  Index Scan using table_name6_pkey on table_name6 c6  (cost=0.00..50.70 rows=1963 width=8) (actual time=0.008..1.044 rows=2000 loops=1)
 Total runtime: 70.201 ms
(17 rows)

更新:大多数人更喜欢 JOIN 语法而不是 where ... 语法:

EXPLAIN ANALYZE
CREATE TABLE test_table AS
SELECT c1.ztimestamp, c1.year, c1.month, c1.day, c1.period
        , c1.the_value + c2.the_value + c3.the_value + c4.the_value
        + c5.the_value + c6.the_value AS the_value
FROM table_name1 AS c1
JOIN table_name2 AS c2 ON c1.ztimestamp=c2.ztimestamp
JOIN table_name3 AS c3 ON c2.ztimestamp=c3.ztimestamp
JOIN table_name4 AS c4 ON c3.ztimestamp=c4.ztimestamp
JOIN table_name5 AS c5 ON c4.ztimestamp=c5.ztimestamp
JOIN table_name6 AS c6 ON c5.ztimestamp=c6.ztimestamp
        ;
于 2012-05-27T15:34:33.843 回答