11

我有一个简单的查询和两个表:

drilldown

CREATE SEQUENCE drilldown_id_seq;

CREATE TABLE drilldown (
    transactionid bigint NOT NULL DEFAULT nextval('drilldown_id_seq'),
    userid bigint NOT NULL default 0 REFERENCES users(id),
    pathid bigint NOT NULL default 0,
    reqms bigint NOT NULL default 0,
    quems bigint NOT NULL default 0,
    clicktime timestamp default current_timestamp,
    PRIMARY KEY(transactionid)
);

ALTER SEQUENCE drilldown_id_seq OWNED BY drilldown.transactionid;

CREATE INDEX drilldown_idx1 ON drilldown (clicktime);

querystats

CREATE SEQUENCE querystats_id_seq;

CREATE TABLE querystats (
    id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
    transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
    querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
    queryms bigint NOT NULL default 0,
    PRIMARY KEY(id)
);

ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;

CREATE INDEX querystats_idx1 ON querystats (transactionid);
CREATE INDEX querystats_idx2 ON querystats (querynameid);

drilldown有150万条记录,querystats有1000万条记录;当我在两者之间加入时,就会出现问题。

询问

explain analyse
select avg(qs.queryms)
  from querystats qs
  join drilldown d on (qs.transactionid=d.transactionid)
  where querynameid=1;

查询计划

聚合(成本=528596.96..528596.97 行=1 宽度=8)(实际时间=5213.154..5213.154 行=1 循环=1)
   -> Hash Join (cost=274072.53..518367.59 rows=4091746 width=8) (实际时间=844.087..3528.788 rows=4117717 loops=1)
         哈希条件:(qs.transactionid = d.transactionid)
         -> 查询统计 qs 上的位图堆扫描(成本=88732.62..210990.44 行=4091746 宽度=16)(实际时间=309.502..1321.029 行=4117717 循环=1)
               重新检查条件:(querynameid = 1)
               -> querystats_idx2 上的位图索引扫描(成本=0.00..87709.68 行=4091746 宽度=0)(实际时间=307.916..307.916 行=4117718 循环=1)
                     索引条件:(querynameid = 1)
         -> 哈希(成本=162842.29..162842.29 行=1371250 宽度=8)(实际时间=534.065..534.065 行=1372574 循环=1)
               存储桶:4096 批次:64 内存使用量:850kB
               -> 使用drilldown_pkey 对drilldown d 进行索引扫描(成本=0.00..162842.29 行=1371250 宽度=8)(实际时间=0.015..364.657 行=1372574 循环=1)
 总运行时间:5213.205 毫秒
(11 行)

我知道我可以为 PostgreSQL 调整一些调整参数,但我想知道的是我正在做的查询是连接两个表的最佳方式吗?

或者可能是某种 INNER JOIN?我只是不确定。

任何指针表示赞赏!

编辑

database#\d drilldown
                                       Table "public.drilldown"
    Column     |            Type             |                       Modifiers                        
---------------+-----------------------------+--------------------------------------------------------
 transactionid | bigint                      | not null default nextval('drilldown_id_seq'::regclass)
 userid        | bigint                      | not null default 0
 pathid        | bigint                      | not null default 0
 reqms         | bigint                      | not null default 0
 quems         | bigint                      | not null default 0
 clicktime     | timestamp without time zone | default now()
Indexes:
    "drilldown_pkey" PRIMARY KEY, btree (transactionid)
    "drilldown_idx1" btree (clicktime)
Foreign-key constraints:
    "drilldown_userid_fkey" FOREIGN KEY (userid) REFERENCES users(id)
Referenced by:
    TABLE "querystats" CONSTRAINT "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

database=# \d querystats
                            Table "public.querystats"
    Column     |  Type  |                        Modifiers                        
---------------+--------+---------------------------------------------------------
 id            | bigint | not null default nextval('querystats_id_seq'::regclass)
 transactionid | bigint | not null default 0
 querynameid   | bigint | not null default 0
 queryms       | bigint | not null default 0
Indexes:
    "querystats_pkey" PRIMARY KEY, btree (id)
    "querystats_idx1" btree (transactionid)
    "querystats_idx2" btree (querynameid)
Foreign-key constraints:
    "querystats_querynameid_fkey" FOREIGN KEY (querynameid) REFERENCES queryname(id)
    "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

所以这是请求的两个表和版本

PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

所以这个查询正在做的是从每个查询类型(querynameid)的queryms的所有行值中获取平均值

            name            |         current_setting          |        source        
----------------------------+----------------------------------+----------------------
 application_name           | psql                             | client
 client_encoding            | UTF8                             | client
 DateStyle                  | ISO, MDY                         | configuration file
 default_text_search_config | pg_catalog.english               | configuration file
 enable_seqscan             | off                              | session
 external_pid_file          | /var/run/postgresql/9.1-main.pid | configuration file
 lc_messages                | en_US.UTF-8                      | configuration file
 lc_monetary                | en_US.UTF-8                      | configuration file
 lc_numeric                 | en_US.UTF-8                      | configuration file
 lc_time                    | en_US.UTF-8                      | configuration file
 log_line_prefix            | %t                               | configuration file
 log_timezone               | localtime                        | environment variable
 max_connections            | 100                              | configuration file
 max_stack_depth            | 2MB                              | environment variable
 port                       | 5432                             | configuration file
 shared_buffers             | 24MB                             | configuration file
 ssl                        | on                               | configuration file
 TimeZone                   | localtime                        | environment variable
 unix_socket_directory      | /var/run/postgresql              | configuration file
(19 rows)

我看到 enable_seqscan=off,我没有碰任何设置,这是一个完全默认的安装。

更新

我对以下评论进行了一些更改,结果如下。

explain analyse SELECT (SELECT avg(queryms) AS total FROM querystats WHERE querynameid=3) as total FROM querystats qs JOIN drilldown d ON (qs.transactionid=d.transactionid) WHERE qs.querynameid=3 limit 1;
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=196775.99..196776.37 rows=1 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=196775.94..196775.99 rows=1 width=8) (actual time=2320.815..2320.815 rows=1 loops=1)
           ->  Bitmap Heap Scan on querystats  (cost=24354.25..189291.69 rows=2993698 width=8) (actual time=226.516..1144.690 rows=2999798 loops=1)
                 Recheck Cond: (querynameid = 3)
                 ->  Bitmap Index Scan on querystats_idx  (cost=0.00..23605.83 rows=2993698 width=0) (actual time=225.119..225.119 rows=2999798 loops=1)
                       Index Cond: (querynameid = 3)
   ->  Nested Loop  (cost=0.00..1127817.12 rows=2993698 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
         ->  Seq Scan on drilldown d  (cost=0.00..76745.10 rows=1498798 width=8) (actual time=0.009..0.009 rows=1 loops=1)
         ->  Index Scan using querystats_idx on querystats qs  (cost=0.00..0.60 rows=2 width=8) (actual time=0.045..0.045 rows=1 loops=1)
               Index Cond: ((querynameid = 3) AND (transactionid = d.transactionid))
 Total runtime: 2320.940 ms
(12 rows)
4

5 回答 5

10

它的行为就像你已经设置了一样enable_seqscan = off,因为它使用索引扫描来填充哈希表。除非作为诊断步骤,否则切勿关闭任何计划器选项,如果您要显示计划,请显示使用的任何选项。这可以运行以显示许多有用的信息:

SELECT version();
SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

如果您告诉我们运行时环境,尤其是机器上的 RAM 量、存储系统的外观以及数据库的大小(或者更好的是,数据库中经常引用数据的活动数据集),这也会有所帮助)。

作为粗略的细分,5.2 秒细分为:

  1. 1.3 秒找到querystats与您的选择标准匹配的 4,117,717 行。
  2. 2.3 秒随机匹配drilldown记录。
  3. 1.6 秒通过 4,117,717 行并计算平均值。

因此,即使您似乎削弱了它使用最快计划的能力,它也只需要 1.26 微秒(百万分之一秒)即可找到每一行,将其连接到另一行,然后计算平均值。这在绝对基础上并不算太糟糕,但你几乎可以肯定会得到一个稍微快一点的计划。

首先,如果您使用的是 x 小于 3 的 9.2.x,请立即升级到 9.2.3。在最近的版本中修复了某些类型的计划的性能回归,这可能会影响此查询。一般来说,尽量在次要版本上保持最新(版本号更改超过第二个点)。

您可以通过仅在该连接上设置计划因素并运行查询(或在其上)来测试单个会话中的不同计划EXPLAIN。尝试这样的事情:

SET seq_page_cost = 0.1;
SET random_page_cost = 0.1;
SET cpu_tuple_cost = 0.05;
SET effective_cache_size = '3GB'; -- actually use shared_buffers plus OS cache

确保所有enable_设置都是on.

于 2013-02-10T15:58:33.560 回答
3

You claim in your question:

I see that enable_seqscan=off, I have not touched any settings, this is a completely default install.

In contrast, the output from pg_settings tells us:

enable_seqscan | off | session

Meaning, that you set enable_seqscan = off in your session. Something is not adding up here.

Run

SET enable_seqscan = on;

or

RESET enable_seqscan;

Assert:

SHOW enable_seqscan;

Also, your setting for shared_buffers is way too low for a db with millions of records. 24MB seems to be the conservative setting of Ubuntu out-of-the-box. You need to edit your configuration files for serious use! I quote the manual:

If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system.

So edit your postgresql.conf file to increase the value and reload.
Then try your query again and find out how enable_seqscan was turned off.

于 2013-02-12T12:26:02.120 回答
1

对我来说,querystats 表看起来像一个联结表。在这种情况下:省略代理键,并使用自然(复合)键(两个组件都不能为空)并添加反向复合索引。(单独的索引是没用的,FK 约束会自动为你生成它们)

-- CREATE SEQUENCE querystats_id_seq;

CREATE TABLE querystats (
    -- id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
    transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
    querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
    queryms bigint NOT NULL default 0,
    PRIMARY KEY(transactionid,querynameid )
);

-- ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;

--CREATE INDEX querystats_idx1 ON querystats (transactionid);
-- CREATE INDEX querystats_idx2 ON querystats (querynameid);
CREATE UNIQUE INDEX querystats_alt ON querystats (querynameid, transactionid);
于 2013-02-10T15:42:48.860 回答
1

在这个查询中

select avg(qs.queryms) 
from querystats qs 
join drilldown d 
  on (qs.transactionid=d.transactionid) 
where querynameid=1;

您没有使用“向下钻取”表中的任何列。由于外键约束保证“querystats”中的每个“transactionid”在“drilldown”中都有一行,我认为连接不会做任何有用的事情。除非我错过了什么,否则您的查询相当于

select avg(qs.queryms) 
from querystats qs 
where querynameid=1;

根本没有加入。只要“querynameid”上有索引,您就应该获得不错的性能。

于 2013-02-10T01:45:59.427 回答
1

不加入时,avg(qs.queryms)执行一次。

当您执行连接时,您执行avg(qs.queryms)的次数与连接生成的行数一样多。

如果您总是对单个 querynameid 感兴趣,请尝试放入子avg(qs.queryms)选择:

SELECT 
    (SELECT avg(queryms) FROM querystats WHERE querynameid=1) 
FROM querystats qs 
JOIN drilldown d ON (qs.transactionid=d.transactionid) 
WHERE qs.querynameid=1;
于 2013-02-10T04:48:47.537 回答