2

我想删除表中的所有行,其中 batchId (运行编号)比前两个旧。我可能可以在带有查询的 SQL 数据库中执行此操作:

WITH CTE AS(
    SELECT
        *,
        DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
    FROM MyTable
)
DELETE FROM CTE WHERE RN>2

但是这在 SQL 数据仓库中是不允许。在这里寻找替代品。

4

3 回答 3

1

Azure SQL 数据仓库仅支持有限的 T-SQL 表面区域和 CTE,用于DELETE操作和DELETEswithFROM子句,这将产生以下错误:

消息 100029,级别 16,状态 1,第 1 行
DELETE 语句中当前不支持 FROM 子句。

但是,它确实支持子查询,因此一种编写语句的方法如下:

DELETE dbo.MyTable
WHERE BATCHID Not In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC );

Azure SQL 数据仓库支持此语法,我已经对其进行了测试。我不确定它在数十亿行上的效率如何。您还可以考虑分区切换。

如果您要删除表的大部分,那么使用 CTAS 将要保留的数据放入新表中可能是有意义的,例如:

-- Keep the most recent two BATCHIDS
CREATE TABLE dbo.MyTable2
WITH
(
    CLUSTERED COLUMNSTORE INDEX,
    DISTRIBUTION = HASH( BATCHID )
    -- Add partition scheme here if required
)
AS
SELECT  *
FROM dbo.MyTable
WHERE BATCHID In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC )
OPTION ( LABEL = 'CTAS : Keep top two BATCHIDs' );
GO

-- Rename or DROP old table
RENAME OBJECT dbo.MyTable TO MyTable_Old;
RENAME OBJECT dbo.MyTable2 TO MyTable;
GO

-- Optionally DROP MyTable_Old if everything has been successful
-- DROP TABLE MyTable_Old

此处更详细地描述了此技术。

于 2019-02-11T22:48:33.200 回答
1

您可以尝试使用 JOIN

delete d from MyTable d
join 
(
 SELECT
        *,
        RN = ROW_NUMBER() OVER(PARTITION BY BATCH_ID ORDER BY BATCH_ID DESC)
    FROM MyTable
)A on d.batch_id=A.batch_id where RN >2
于 2019-02-11T06:05:35.320 回答
0

你可以试试:

delete t from mytable t
    where batchId < (select max(batchid) from mytable);

哦,如果你想保留两个,也许这会起作用:

delete t from mytable t
    where batchId < (select batchid
                     from mytable
                     group by batchid
                     limit 1 offset 1
                    );
于 2019-02-11T13:04:32.403 回答