我的组织目前正在评估 Azure 数据仓库。我们有一个包含 16M 行的事实表和另一个包含 5M 行的事实表,它们都散列在同一列上(相同的数据类型和长度)。
当使用 smallrc 资源类以 200 DWU 规模执行内部连接时,查询需要将近 6 分钟,但 DWU 使用情况(如门户中所示)只是所有可用 DWU 的一小部分。此外,当同一查询同时启动多次(通过 SSIS 执行 SQL 任务)时,数据仓库始终执行 6 个实例(最多)。
我们已将 DWU 设置从 200 更改为 500 到 1000,最后更改为 2000,但每次同一查询的最大并发运行次数仅限于 6 次,并且 DWU 使用量始终保持在可用总量的一小部分,没有明显的性能的变化。
这是预期的行为吗?肯定会大大增加 DWU 应该会大大减少查询执行时间吗?我很欣赏这是非常通用的,但我想了解我是否没有考虑过对这种分析很重要的事情?
这是解释:
explain
<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="10" number_distributions="60" number_distributions_per_node="6">
<sql>select shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type], count(distinct booking_removal.[Booking ID]) from shifts join booking_removal on shifts.[Shift reference] = booking_removal.[Shift Reference] join dimdate on shifts.[Shift Start Date] = date where year in (2015, 2016)
group by shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type]</sql>
<dsql_operations total_cost="1.22805816" total_number_operations="5">
<dsql_operation operation_type="RND_ID">
<identifier>TEMP_ID_15134</identifier>
</dsql_operation>
<dsql_operation operation_type="ON">
<location permanent="false" distribution="AllComputeNodes" />
<sql_operations>
<sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15134] ([Date] DATE, [YYYYMM] INT ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
</sql_operations>
</dsql_operation>
<dsql_operation operation_type="BROADCAST_MOVE">
<operation_cost cost="1.22805816" accumulative_cost="1.22805816" average_rowsize="7" output_rows="730.987" GroupNumber="15" />
<source_statement>SELECT [T1_1].[Date] AS [Date],
[T1_1].[YYYYMM] AS [YYYYMM]
FROM (SELECT [T2_1].[Date] AS [Date],
[T2_1].[YYYYMM] AS [YYYYMM]
FROM [NHSP-Shifts-DW].[dbo].[DimDate] AS T2_1
WHERE (([T2_1].[Year] = CAST ((2015) AS INT))
OR ([T2_1].[Year] = CAST ((2016) AS INT)))) AS T1_1</source_statement>
<destination_table>[TEMP_ID_15134]</destination_table>
</dsql_operation>
<dsql_operation operation_type="RETURN">
<location distribution="AllDistributions" />
<select>SELECT [T1_1].[Shift Reference] AS [Shift Reference],
[T1_1].[Trust Code] AS [Trust Code],
[T1_1].[Ward Code] AS [Ward Code],
[T1_1].[Location Code] AS [Location Code],
[T1_1].[Qualification] AS [Qualification],
[T1_1].[YYYYMM] AS [YYYYMM],
[T1_1].[Booking Type] AS [Booking Type],
[T1_1].[Booking Type] AS [Booking Type1],
[T1_1].[col] AS [col]
FROM (SELECT COUNT([T2_1].[Booking ID]) AS [col],
[T2_1].[Shift Reference] AS [Shift Reference],
[T2_1].[Trust Code] AS [Trust Code],
[T2_1].[Ward Code] AS [Ward Code],
[T2_1].[Location Code] AS [Location Code],
[T2_1].[Qualification] AS [Qualification],
[T2_1].[YYYYMM] AS [YYYYMM],
[T2_1].[Booking Type] AS [Booking Type]
FROM (SELECT [T3_1].[Booking ID] AS [Booking ID],
[T3_2].[Shift Reference] AS [Shift Reference],
[T3_2].[Trust Code] AS [Trust Code],
[T3_2].[Ward Code] AS [Ward Code],
[T3_2].[Location Code] AS [Location Code],
[T3_2].[Qualification] AS [Qualification],
[T3_2].[YYYYMM] AS [YYYYMM],
[T3_1].[Booking Type] AS [Booking Type]
FROM [NHSP-Shifts-DW].[dbo].[Booking_Removal] AS T3_1
INNER JOIN
(SELECT [T4_2].[Shift Reference] AS [Shift Reference],
[T4_2].[Trust Code] AS [Trust Code],
[T4_2].[Ward Code] AS [Ward Code],
[T4_2].[Location Code] AS [Location Code],
[T4_2].[Qualification] AS [Qualification],
[T4_1].[YYYYMM] AS [YYYYMM]
FROM [tempdb].[dbo].[TEMP_ID_15134] AS T4_1
INNER JOIN
[NHSP-Shifts-DW].[dbo].[Shifts] AS T4_2
ON ([T4_1].[Date] = [T4_2].[Shift Start Date])) AS T3_2 ON ([T3_1].[Shift Reference] = [T3_2].[Shift Reference])
GROUP BY [T3_2].[Shift Reference], [T3_2].[Trust Code], [T3_2].[Ward Code], [T3_2].[Location Code], [T3_2].[Qualification], [T3_2].[YYYYMM], [T3_1].[Booking Type], [T3_1].[Booking ID]) AS T2_1
GROUP BY [T2_1].[Shift Reference], [T2_1].[Trust Code], [T2_1].[Ward Code], [T2_1].[Location Code], [T2_1].[Qualification], [T2_1].[YYYYMM], [T2_1].[Booking Type]) AS T1_1</select>
</dsql_operation>
<dsql_operation operation_type="ON">
<location permanent="false" distribution="AllComputeNodes" />
<sql_operations>
<sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15134]</sql_operation>
</sql_operations>
</dsql_operation>
</dsql_operations>
</dsql_query>
在分发表格时,我确实检查了数据倾斜,而 [shift reference] 列几乎没有提供。结果如下:
188648 123888 55888 8872 1 1
189096 123816 55960 9320 1 10
189352 123760 56000 9592 1 11
189488 124000 55960 9528 1 12
189096 123640 55984 9472 2 13
189544 123848 55952 9744 2 14
189024 123656 55952 9416 2 15
188960 123560 55984 9416 2 16
188256 123416 55944 8896 2 17
188840 123544 56000 9296 2 18
189096 123680 55960 9456 2 19
188840 123792 55936 9112 1 2
189288 123672 56000 9616 2 20
188648 123744 55976 8928 2 21
188840 123840 55976 9024 2 22
188776 123696 56032 9048 2 23
188392 123384 55944 9064 2 24
188384 123656 55976 8752 3 25
189096 123992 55968 9136 3 26
189032 123960 55968 9104 3 27
188968 123896 55960 9112 3 28
188776 123560 55976 9240 3 29
188840 123576 55960 9304 1 3
189024 123576 56000 9448 3 30
188768 123448 55920 9400 3 31
188896 123584 55992 9320 3 32
188960 123792 55984 9184 3 33
188456 123280 55992 9184 3 34
189736 124192 55976 9568 3 35
189288 123864 56024 9400 3 36
189168 123624 55976 9568 4 37
189096 123704 56016 9376 4 38
188968 123520 55992 9456 4 39
189672 123816 55952 9904 1 4
188904 123704 55944 9256 4 40
188712 123600 55976 9136 4 41
189160 123688 55976 9496 4 42
189416 124008 55976 9432 4 43
189032 123816 55952 9264 4 44
188968 123680 55984 9304 4 45
189352 123816 55992 9544 4 46
189416 124056 56008 9352 4 47
189608 123984 55992 9632 4 48
189160 123712 55968 9480 5 49
189224 123936 55928 9360 1 5
189480 123880 55968 9632 5 50
189480 123808 56032 9640 5 51
188840 123528 55968 9344 5 52
188960 123632 55936 9392 5 53
189024 123752 55952 9320 5 54
189216 123768 55976 9472 5 55
189600 123928 55960 9712 5 56
188456 123560 55912 8984 5 57
189664 124160 55976 9528 5 58
189096 123824 55976 9296 5 59
189288 123896 55960 9432 1 6
189096 123696 55936 9464 5 60
189224 123664 55952 9608 1 7
188776 123864 55920 8992 1 8
188712 123752 55920 9040 1 9