1

使用 CTAS,我们可以利用 Polybase 提供的并行性以高度可扩展和高性能的方式将数据加载到新表中。

有没有办法使用类似的方法将数据加载到现有表中?桌子甚至可能是空的。

创建一个外部表并使用INSERT INTO ... SELECT * FROM ...- 我会假设这通过头节点,因此不是并行的?

我知道我也可以删除表并使用 CTAS 重新创建它,但是我必须再次处理所有元数据(列名、数据类型、分布......)。

4

1 回答 1

2

您可以使用分区切换来执行此操作,但请记住不要在 Azure SQL 数据仓库中使用太多分区。请参阅此处的“分区大小指南” 。

请记住,不支持检查约束,因此源表必须使用与目标表相同的分区方案。

分区和切换语法的完整示例:

-- Assume we have a file with the values 1 to 100 in it.

-- Create an external table over it; will have all records in
IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'ext' )
EXEC ( 'CREATE SCHEMA ext' )
GO


-- DROP EXTERNAL TABLE ext.numbers
IF NOT EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('ext.numbers') )
CREATE EXTERNAL TABLE ext.numbers (
    number          INT             NOT NULL
    )
WITH (
    LOCATION = 'numbers.csv',
    DATA_SOURCE = eds_yourDataSource, 
    FILE_FORMAT = ff_csv
);
GO

-- Create a partitioned, internal table with the records 1 to 50
IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers

CREATE TABLE dbo.numbers
WITH (
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED INDEX ( number ), 
    PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
    )
AS 
SELECT * 
FROM ext.numbers
WHERE number Between 1 And 50;
GO

-- DBCC PDW_SHOWPARTITIONSTATS ('dbo.numbers')


-- CTAS the second half of the external table, records 51-100 into an internal one.
-- As check contraints are not available in SQL Data Warehouse, ensure the switch table
-- uses the same scheme as the original table.
IF OBJECT_ID('dbo.numbers_part2') IS NOT NULL DROP TABLE dbo.numbers_part2

CREATE TABLE dbo.numbers_part2
WITH (
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED INDEX ( number ),
    PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
    )
AS 
SELECT *
FROM ext.numbers
WHERE number > 50
GO


-- Partition switch it into the original table
ALTER TABLE dbo.numbers_part2 SWITCH PARTITION 2 TO dbo.numbers PARTITION 2;


SELECT *
FROM dbo.numbers
ORDER BY 1;
于 2016-12-01T00:50:14.827 回答