1

我的组织目前正在评估 Azure 数据仓库。我们有一个包含 16M 行的事实表和另一个包含 5M 行的事实表,它们都散列在同一列上(相同的数据类型和长度)。

当使用 smallrc 资源类以 200 DWU 规模执行内部连接时,查询需要将近 6 分钟,但 DWU 使用情况(如门户中所示)只是所有可用 DWU 的一小部分。此外,当同一查询同时启动多次(通过 SSIS 执行 SQL 任务)时,数据仓库始终执行 6 个实例(最多)。

我们已将 DWU 设置从 200 更改为 500 到 1000,最后更改为 2000,但每次同一查询的最大并发运行次数仅限于 6 次,并且 DWU 使用量始终保持在可用总量的一小部分,没有明显的性能的变化。

这是预期的行为吗?肯定会大大增加 DWU 应该会大大减少查询执行时间吗?我很欣赏这是非常通用的,但我想了解我是否没有考虑过对这种分析很重要的事情?

这是解释:

explain
<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="10" number_distributions="60" number_distributions_per_node="6">
  <sql>select shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type], count(distinct booking_removal.[Booking ID])  from shifts join booking_removal on shifts.[Shift reference] = booking_removal.[Shift Reference] join dimdate on shifts.[Shift Start Date] = date where year in  (2015, 2016)
group by shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type]</sql>
  <dsql_operations total_cost="1.22805816" total_number_operations="5">
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_15134</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>      
<sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15134] ([Date] DATE, [YYYYMM] INT ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="BROADCAST_MOVE">
      <operation_cost cost="1.22805816" accumulative_cost="1.22805816" average_rowsize="7" output_rows="730.987" GroupNumber="15" />                                                                                                                                    
<source_statement>SELECT [T1_1].[Date] AS [Date],
       [T1_1].[YYYYMM] AS [YYYYMM]
FROM   (SELECT [T2_1].[Date] AS [Date],
               [T2_1].[YYYYMM] AS [YYYYMM]
        FROM   [NHSP-Shifts-DW].[dbo].[DimDate] AS T2_1
        WHERE  (([T2_1].[Year] = CAST ((2015) AS INT))
                OR ([T2_1].[Year] = CAST ((2016) AS INT)))) AS T1_1</source_statement>
      <destination_table>[TEMP_ID_15134]</destination_table>
    </dsql_operation>
    <dsql_operation operation_type="RETURN">
      <location distribution="AllDistributions" />
<select>SELECT [T1_1].[Shift Reference] AS [Shift Reference],
       [T1_1].[Trust Code] AS [Trust Code],
       [T1_1].[Ward Code] AS [Ward Code],
       [T1_1].[Location Code] AS [Location Code],
       [T1_1].[Qualification] AS [Qualification],
       [T1_1].[YYYYMM] AS [YYYYMM],
       [T1_1].[Booking Type] AS [Booking Type],
       [T1_1].[Booking Type] AS [Booking Type1],
       [T1_1].[col] AS [col]                                                                                                        
FROM   (SELECT   COUNT([T2_1].[Booking ID]) AS [col],
                 [T2_1].[Shift Reference] AS [Shift Reference],
                 [T2_1].[Trust Code] AS [Trust Code],
                 [T2_1].[Ward Code] AS [Ward Code],
                 [T2_1].[Location Code] AS [Location Code],
                 [T2_1].[Qualification] AS [Qualification],
                 [T2_1].[YYYYMM] AS [YYYYMM],
                 [T2_1].[Booking Type] AS [Booking Type]                                                                   
FROM     (SELECT   [T3_1].[Booking ID] AS [Booking ID],
                           [T3_2].[Shift Reference] AS [Shift Reference],
                           [T3_2].[Trust Code] AS [Trust Code],
                           [T3_2].[Ward Code] AS [Ward Code],
                           [T3_2].[Location Code] AS [Location Code],
                           [T3_2].[Qualification] AS [Qualification],
                           [T3_2].[YYYYMM] AS [YYYYMM],
                           [T3_1].[Booking Type] AS [Booking Type]
FROM     [NHSP-Shifts-DW].[dbo].[Booking_Removal] AS T3_1
                           INNER JOIN
                           (SELECT [T4_2].[Shift Reference] AS [Shift Reference],
                                   [T4_2].[Trust Code] AS [Trust Code],
                                   [T4_2].[Ward Code] AS [Ward Code],
                                   [T4_2].[Location Code] AS [Location Code],
                                   [T4_2].[Qualification] AS [Qualification],
                                   [T4_1].[YYYYMM] AS [YYYYMM]
FROM   [tempdb].[dbo].[TEMP_ID_15134] AS T4_1
                                   INNER JOIN
                                   [NHSP-Shifts-DW].[dbo].[Shifts] AS T4_2
                                   ON ([T4_1].[Date] = [T4_2].[Shift Start Date])) AS T3_2 ON ([T3_1].[Shift Reference] = [T3_2].[Shift Reference])
                  GROUP BY [T3_2].[Shift Reference], [T3_2].[Trust Code], [T3_2].[Ward Code], [T3_2].[Location Code], [T3_2].[Qualification], [T3_2].[YYYYMM], [T3_1].[Booking Type], [T3_1].[Booking ID]) AS T2_1
GROUP BY [T2_1].[Shift Reference], [T2_1].[Trust Code], [T2_1].[Ward Code], [T2_1].[Location Code], [T2_1].[Qualification], [T2_1].[YYYYMM], [T2_1].[Booking Type]) AS T1_1</select>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15134]</sql_operation>
      </sql_operations>
    </dsql_operation>
  </dsql_operations>
</dsql_query>

在分发表格时,我确实检查了数据倾斜,而 [shift reference] 列几乎没有提供。结果如下:

188648  123888  55888   8872    1   1
189096  123816  55960   9320    1   10
189352  123760  56000   9592    1   11
189488  124000  55960   9528    1   12
189096  123640  55984   9472    2   13
189544  123848  55952   9744    2   14
189024  123656  55952   9416    2   15
188960  123560  55984   9416    2   16
188256  123416  55944   8896    2   17
188840  123544  56000   9296    2   18
189096  123680  55960   9456    2   19
188840  123792  55936   9112    1   2
189288  123672  56000   9616    2   20
188648  123744  55976   8928    2   21
188840  123840  55976   9024    2   22
188776  123696  56032   9048    2   23
188392  123384  55944   9064    2   24
188384  123656  55976   8752    3   25
189096  123992  55968   9136    3   26
189032  123960  55968   9104    3   27
188968  123896  55960   9112    3   28
188776  123560  55976   9240    3   29
188840  123576  55960   9304    1   3
189024  123576  56000   9448    3   30
188768  123448  55920   9400    3   31
188896  123584  55992   9320    3   32
188960  123792  55984   9184    3   33
188456  123280  55992   9184    3   34
189736  124192  55976   9568    3   35
189288  123864  56024   9400    3   36
189168  123624  55976   9568    4   37
189096  123704  56016   9376    4   38
188968  123520  55992   9456    4   39
189672  123816  55952   9904    1   4
188904  123704  55944   9256    4   40
188712  123600  55976   9136    4   41
189160  123688  55976   9496    4   42
189416  124008  55976   9432    4   43
189032  123816  55952   9264    4   44
188968  123680  55984   9304    4   45
189352  123816  55992   9544    4   46
189416  124056  56008   9352    4   47
189608  123984  55992   9632    4   48
189160  123712  55968   9480    5   49
189224  123936  55928   9360    1   5
189480  123880  55968   9632    5   50
189480  123808  56032   9640    5   51
188840  123528  55968   9344    5   52
188960  123632  55936   9392    5   53
189024  123752  55952   9320    5   54
189216  123768  55976   9472    5   55
189600  123928  55960   9712    5   56
188456  123560  55912   8984    5   57
189664  124160  55976   9528    5   58
189096  123824  55976   9296    5   59
189288  123896  55960   9432    1   6
189096  123696  55936   9464    5   60
189224  123664  55952   9608    1   7
188776  123864  55920   8992    1   8
188712  123752  55920   9040    1   9   
4

2 回答 2

1

您的查询返回多少行?如果它返回很多行,客户端上的返回操作可能是瓶颈,因此扩展 DWU 可能无济于事。你提到货币总是6。难道这是SSIS设置?在 SQL DW 中,并发取决于分配的 DWU 的数量。这在 SQL 数据仓库中的并发和工作负载管理一文中有详细说明。要查看一次实际运行的查询数量,您可以运行查询。

SELECT * FROM sys.dm_pdw_exec_requests WHERE status = 'Running';
于 2016-05-29T21:17:55.607 回答
1

您每次分发的工作量基本相同,这很好。涉及额外的节点可以提高处理能力,但处理单个分布的工作不会受到太大影响。如果您受 CPU 限制,那么当然可以。如果您受 IO 限制,那么当然可以。但是由于您正在进行的分组,您的查询需要很长时间,依此类推。您可以通过各种方式改进查询,但增加 DWU 并没有多大帮助,除了为其他查询提供更多处理能力。如果您想调整查询以使其总体上运行得更快,那么这是另一种问题。

于 2016-05-24T22:55:42.623 回答