11

在 PostgreSQL 中,我有一个如下查询,它将从 1m 行表中删除 250k 行:

DELETE FROM table WHERE key = 'needle';

该查询需要一个多小时才能执行,在此期间,受影响的行被锁定以进行写入。这不好,因为这意味着许多更新查询必须等待大删除查询完成(然后它们将失败,因为行从它们下面消失了,但这没关系)。我需要一种方法将这个大查询分割成多个部分,以便它们对更新查询造成尽可能小的干扰。例如,如果删除查询可以被分成块,每个块有 1000 行,那么其他更新查询最多必须等待涉及 1000 行的删除查询。

DELETE FROM table WHERE key = 'needle' LIMIT 10000;

该查询可以很好地工作,但可惜它在 postgres 中不存在。

4

3 回答 3

33

尝试子选择并使用唯一条件:

DELETE FROM 
  table 
WHERE 
  id IN (SELECT id FROM table WHERE key = 'needle' LIMIT 10000);
于 2010-08-06T05:55:58.443 回答
2

将删除和更新的锁定级别设置为更精细的锁定模式。请注意,您的交易现在会变慢。

http://www.postgresql.org/docs/current/static/sql-lock.html

http://www.postgresql.org/docs/current/static/explicit-locking.html

于 2010-08-06T06:30:06.950 回答
2

Frak 的回答很好,但这可以更快,但由于窗口函数支持(伪代码)需要 8.4:

result = query('select
    id from (
        select id, row_number(*) over (order by id) as row_number
        from mytable where key=?
    ) as _
    where row_number%8192=0 order by id, 'needle');
// result contains ids of every 8192nd row which key='needle'
last_id = 0;
result.append(MAX_INT); // guard
for (row in result) {
    query('delete from mytable
        where id<=? and id>? and key=?, row.id, last_id, 'needle');
    // last_id is used to hint query planner,
    // that there will be no rows with smaller id
    // so it is less likely to use full table scan
    last_id = row.id;
}

这是过早的优化——邪恶的事情。谨防。

于 2010-08-06T15:38:12.853 回答