我想删除表中的所有行,其中 batchId (运行编号)比前两个旧。我可能可以在带有查询的 SQL 数据库中执行此操作:
WITH CTE AS(
SELECT
*,
DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN>2
但是这在 SQL 数据仓库中是不允许的。在这里寻找替代品。
我想删除表中的所有行,其中 batchId (运行编号)比前两个旧。我可能可以在带有查询的 SQL 数据库中执行此操作:
WITH CTE AS(
SELECT
*,
DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN>2
但是这在 SQL 数据仓库中是不允许的。在这里寻找替代品。
Azure SQL 数据仓库仅支持有限的 T-SQL 表面区域和 CTE,用于DELETE
操作和DELETEs
withFROM
子句,这将产生以下错误:
消息 100029,级别 16,状态 1,第 1 行
DELETE 语句中当前不支持 FROM 子句。
但是,它确实支持子查询,因此一种编写语句的方法如下:
DELETE dbo.MyTable
WHERE BATCHID Not In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC );
Azure SQL 数据仓库支持此语法,我已经对其进行了测试。我不确定它在数十亿行上的效率如何。您还可以考虑分区切换。
如果您要删除表的大部分,那么使用 CTAS 将要保留的数据放入新表中可能是有意义的,例如:
-- Keep the most recent two BATCHIDS
CREATE TABLE dbo.MyTable2
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = HASH( BATCHID )
-- Add partition scheme here if required
)
AS
SELECT *
FROM dbo.MyTable
WHERE BATCHID In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC )
OPTION ( LABEL = 'CTAS : Keep top two BATCHIDs' );
GO
-- Rename or DROP old table
RENAME OBJECT dbo.MyTable TO MyTable_Old;
RENAME OBJECT dbo.MyTable2 TO MyTable;
GO
-- Optionally DROP MyTable_Old if everything has been successful
-- DROP TABLE MyTable_Old
此处更详细地描述了此技术。
您可以尝试使用 JOIN
delete d from MyTable d
join
(
SELECT
*,
RN = ROW_NUMBER() OVER(PARTITION BY BATCH_ID ORDER BY BATCH_ID DESC)
FROM MyTable
)A on d.batch_id=A.batch_id where RN >2
你可以试试:
delete t from mytable t
where batchId < (select max(batchid) from mytable);
哦,如果你想保留两个,也许这会起作用:
delete t from mytable t
where batchId < (select batchid
from mytable
group by batchid
limit 1 offset 1
);