1

有两张桌子

tmp_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)

main_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)

当没有这样的(日期、站点 ID 等)时,我需要从 tmp_stat 将行插入到 main_stat 中,并在它们已经存在时尽快更新计数

tmp_stat 包含大约 500000 行,main_stat 包含百万

4

3 回答 3

6

下面的工作吗?

WITH upd AS (
    UPDATE main_stat t
       SET counter = s.counter
      FROM tmp_stat s
     WHERE t.date = s.date
            AND t.site_id = s.site_id
            AND t.ip = s.ip
            AND t.block_id = s.block_id
 RETURNING s.date, s.site_id, s.ip, s.block_id, s.counter
)
INSERT INTO main_stat
     SELECT s.mydate, s.site_id, s.ip, s.block_id, s.counter
       FROM tmp_stat s 
       LEFT JOIN upd ON (upd.date = s.date and  upd.site_id = s.site_id and upd.ip = s.ip and upd.block_id = s.block_id)
      WHERE upd.date IS NULL
;

更新:

看起来这仅适用于 9.1 或更高版本。

使用 just-somebody 的建议WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id)似乎可以提供更好的性能。

WITH upd AS (
    UPDATE main_stat t
       SET counter = s.counter
      FROM tmp_stat s
     WHERE ( t.date, t.site_id, t.ip, t.block_id ) = ( s.date, s.site_id, s.ip, s.block_id )
 RETURNING s.date, s.site_id, s.ip, s.block_id
)
INSERT INTO main_stat
     SELECT s.date, s.site_id, s.ip, s.block_id, s.counter
       FROM tmp_stat s 
       LEFT JOIN upd 
            ON ( upd.date = s.date 
                AND upd.site_id = s.site_id 
                AND upd.ip = s.ip 
                AND upd.block_id = s.block_id )
      WHERE upd.date IS NULL
;

这里发生的是我们使用 CTE 执行 UPDATE,CTE 返回更新行的标识列。

然后 INSERT 使用更新的行信息过滤 tmp_stat 以仅插入记录。

Dimitri Fontaine 在此博客条目中介绍了一些并发警告。

有关 CTE 的更多信息可以在 Postgresql文档中找到。

于 2013-03-20T21:42:45.263 回答
2

看起来像简单的Exists查询......如果列被索引它应该足够快。

例子:

-- insert missing rows
INSERT INTO main_stat (date, site_id, ip, block_id)
SELECT date, site_id, ip, block_id FROM tmp_stat tmp
WHERE NOT EXISTS (SELECT 1 FROM main_stats main 
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id
                 );
-- update count for existing rows
UPDATE main_stat main 
SET count =  main.count + (SELECT count FROM tmp_stats tmp
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id
                           LIMIT 1)

WHERE EXISTS (SELECT 1 FROM main_stats main 
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id
于 2013-03-28T16:35:28.737 回答
1

当我理解这个问题时,我正在建立 gsimes 的答案。

with agg_temp_stat as (
    select date, site_id, ip, block_id, sum(counter)::integer counter
    from temp_stat
    group by 1, 2, 3, 4
), upd as (
    update main_stat t
    set counter = counter + s.counter
    from agg_tmp_stat s
    where
        (t.date, t.site_id, t.ip, t.block_id)
        = (s.date, s.site_id, s.ip, s.block_id)
    returning s.date, s.site_id, s.ip, s.block_id
)
insert into main_stat
select s.date, s.site_id, s.ip, s.block_id, s.counter
from
    agg_tmp_stat s 
    left join
    upd on
        upd.date = s.date 
        and upd.site_id = s.site_id 
        and upd.ip = s.ip 
        and upd.block_id = s.block_id
where upd.date is null

基本上聚合临时表并将结果计数器与已经存在的计数器相加。

于 2013-03-25T18:52:19.413 回答