database - Postgresql: inner join takes 70 seconds

Question

I have two tables -

Table A : 1MM rows, AsOfDate, Id, BId (foreign key to table B)

Table B : 50k rows, Id, Flag, ValidFrom, ValidTo

Table A contains multiple records per day between 2011/01/01 and 2011/12/31 across 100 BId's. Table B contains multiple non overlapping (between validfrom and validto) records for 100 Bids.

The task of the join will be to return the flag that was active for the BId on the given AsOfDate.

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.BId and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
    a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231

This query takes ~70 seconds on a very high end server (+3Ghz) with 64Gb of memory.

I have indexes on every combination of field as I'm testing this - to no avail.

Indexes : a.AsOfDate, a.AsOfDate+a.bId, a.bid Indexes : b.bid, b.bid+b.validfrom

Also tried the range queries suggested below (62seconds)

This same query on the free version of Sql Server running in a VM takes ~1 second to complete.

any ideas?

Postgres 9.2

Query Plan

QUERY PLAN                                       
---------------------------------------------------------------------------------------
Aggregate  (cost=8274298.83..8274298.84 rows=1 width=0)
->  Hash Join  (cost=1692.25..8137039.36 rows=54903787 width=0)
    Hash Cond: (a.bid = b.bid)
     Join Filter: ((b.validfrom <= a.asofdate) AND (b.validto >= a.asofdate))
     ->  Seq Scan on "A" a  (cost=0.00..37727.00 rows=986467 width=12)
           Filter: ((asofdate > 20110101) AND (asofdate < 20111231))
     ->  Hash  (cost=821.00..821.00 rows=50100 width=12)
           ->  Seq Scan on "B" b  (cost=0.00..821.00 rows=50100 width=12)

see http://explain.depesz.com/s/1c5 for the analyze output

here is the query plan from sqlserver for the same query

score 0 · Accepted Answer

考虑使用 postgresql 9.2 中可用的范围类型：

create index on a using gist(int4range(asofdate, asofdate, '[]'));
create index on b using gist(int4range(validfrom, validto, '[]'));

您可以查询匹配范围内的日期，如下所示：

select * from a
where int4range(asofdate,asofdate,'[]') && int4range(20110101, 20111231, '[]');

对于 b 中的行与 a 中的记录重叠，如下所示：

select *
from b
    join a on int4range(b.validfrom,b.validto,'[]') @> a.asofdate
where a.id = 1

（&&表示“重叠”，@>表示“包含”，并'[]'表示创建一个包含两个端点的范围）

score 0 · Accepted Answer

问题出在索引上-由于某种原因我不清楚，查询分析器没有正确引用表上的索引-我将它们全部删除，将它们添加回来（完全相同-通过脚本），现在查询需要约 303 毫秒。

感谢您对这个非常令人沮丧的问题的所有帮助。

database - Postgresql: inner join takes 70 seconds

2 回答 2

Related

Reference