27

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.

In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.

Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.

When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.

I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.

Any ideas? I have already read older related posts here and found nothing that would help.

4

8 回答 8

31

请参阅:在 SQL Server 上优化删除

这篇 MS 支持文章可能会引起您的兴趣:如何解决由 SQL Server 中的锁升级引起的阻塞问题

将大批量操作分解为几个较小的操作。例如,假设您运行以下查询以从审计表中删除数十万条旧记录,然后您发现它导致锁定升级并阻止其他用户:

DELETE FROM LogMessages WHERE LogDate < '2/1/2002'    

通过一次删除数百条记录,您可以显着减少每个事务累积的锁数量并防止锁升级。例如:

SET ROWCOUNT 500
delete_more:
     DELETE FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more
SET ROWCOUNT 0

通过使查询尽可能高效来减少查询的锁占用空间。大扫描或大量书签查找可能会增加锁升级的机会;此外,它会增加死锁的可能性,并且通常会对并发性和性能产生不利影响。

于 2009-07-21T12:35:57.923 回答
20
delete_more:
     DELETE TOP(500) FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more

您可以按照SET ROWCOUNTMitch 的建议使用相同的结果,但根据 MSDNDELETE ,在 SQL Server 的未来版本中将不支持它以及其他一些操作:

使用 SET ROWCOUNT 不会影响 SQL Server 未来版本中的 DELETE、INSERT 和 UPDATE 语句。避免在新的开发工作中将 SET ROWCOUNT 与 DELETE、INSERT 和 UPDATE 语句一起使用,并计划修改当前使用它的应用程序。对于类似的行为,请使用 TOP 语法。有关详细信息,请参阅 TOP (Transact-SQL)。

于 2013-07-31T15:38:28.097 回答
2

您可以创建新文件,复制除“已删除”行之外的所有行,然后交换表上的名称。最后,删除旧表。如果您要删除大部分记录,那么这实际上可能会更快。

于 2009-07-21T12:35:27.407 回答
1

Another suggestion is to rename the table and add a status column. When status = 1 (deleted), then you won't want it to show. So you then create a view with the same name as the orginal table which selects from the table when status is null or = 0 (depending on how you implement it). The deletion appears immediate to the user and a background job can run every fifteen minutes deleting records that runs without anyone other than the dbas being aaware of it.

于 2013-07-31T15:49:48.930 回答
0

如果您使用的是 SQL 2005 或 2008,也许使用“快照隔离”会对您有所帮助。它允许数据在进行底层数据更新操作处理时对用户保持可见,然后在数据提交后立即显示数据。即使您删除运行需要 30 分钟,您的应用程序也会在此期间保持在线。

这是快照锁定的快速入门:

http://www.mssqltips.com/tip.asp?tip=1081

尽管您仍应尝试加快删除速度,使其尽可能快,但这可能会减轻一些负担。

于 2009-07-24T04:29:54.390 回答
0

您可以使用 while 循环删除小批量,如下所示:

DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
END
于 2016-12-27T16:06:21.117 回答
0

如果相当大比例的表将匹配删除标准(接近或超过 50%),那么使用不会被删除的记录创建一个临时表“更便宜”(颠倒 WHERE 标准),截断原始表,然后用打算保留的记录重新填充它。

DELETE FROM TABLE WHERE ROW_TO_DELETE = 'OK';
GO

-->

INSERT INTO #TABLE WHERE NOT ROW_TO_DELETE = 'OK';
TRUNCATE TABLE;
INSERT INTO TABLE (SELECT * FROM #TABLE);
GO
于 2020-11-26T13:29:52.600 回答
-1

这是您的问题的解决方案。

DECLARE @RC AS INT
SET @RC = -1

WHILE @RC <> 0
BEGIN
    DELETE TOP(1000000) FROM [Archive_CBO_ODS].[CBO].[AckItem] WHERE [AckItemId] >= 300
    SET @RC = @@ROWCOUNT
    --SET @RC = 0
END
于 2016-12-08T18:36:38.507 回答