sql - 如何加快这个 SQL 查询

Question

首先，提前感谢您提供的任何提示或建议。我不是程序员，但我也没有任何其他方法可以访问我的数据进行分析，所以我一直在学习（大部分是通过搜索 StackOverflow 和谷歌）。

因此，以下查询按预期工作，但速度很慢。我在想我有可以优化代码的地方，但我已经在拍拍自己让它工作了，所以我没有想法。关于如何加快速度的任何想法？

基本思想是它为一个 ID 获取预算数据和实际数据，将每个时间的时间归零（因此这是一个与时间无关的比较），并计算预算与实际累积性能的比率。

编辑：使用 SQL Server Management Studio 2008 R2，添加执行计划

注意：表变量仅用于测试代码。全尺寸代码中使用的永久表。

DECLARE @DailyBudget TABLE ( ID varchar(30), D_Date datetime, A float, B float) 
DECLARE @DailyActuals TABLE ( ID varchar(30), D_Date datetime, A float, B float) 

Insert into @DailyActuals (ID, D_Date, A, B) 
Values
('J3PJKFWDBK',  '5/20/2013', 300,1301)
,('J3PJKFWDBK', '5/21/2013', 290,1351)
,('J3PJKFWDBK', '5/23/2013', 283,1320)

Insert into @DailyBudget (ID, D_Date, A, B) 
Values
('J3PJKFWDBK',  '5/1/2013', 263,1401)
,('J3PJKFWDBK', '5/2/2013', 260,1390)
,('J3PJKFWDBK', '5/3/2013', 257,1380)

;WITH Budgets AS
(SELECT ID, D_Date, A, B,
        ROW_NUMBER() OVER(PARTITION BY ID ORDER BY D_DATE ASC) as 'RowNum'  from @DailyBudget where not (A = 0 and B = 0) and D_Date > CONVERT(datetime, '2013-01-01 00:00:00.000', 102)
)
, Actuals AS
(SELECT ID, D_DATE, A, B, 
        ROW_NUMBER() OVER(PARTITION BY ID ORDER BY D_DATE ASC) as 'RowNum'  from @DailyActuals where not (A = 0 and B = 0) and D_Date > CONVERT(datetime, '2013-01-01 00:00:00.000', 102)
)
, BudgetSum AS
(select t1.ID, t1.RowNum, SUM(t2.A) as [A], SUM(t2.B) as [B]
  from Budgets as t1
    inner join Budgets as t2 on t1.RowNum >= t2.RowNum and t1.ID = t2.ID
  group by t1.ID, t1.RowNum, t1.A
)
, ActualSum AS
(select t1.ID, t1.RowNum, SUM(t2.A) as [A], SUM(t2.B) as [B]
  from Actuals as t1
    inner join Actuals as t2 on t1.RowNum >= t2.RowNum and t1.ID = t2.ID
  group by t1.ID, t1.RowNum, t1.A
)
SELECT Budgets.ID, Budgets.D_DATE as [Budget_Date], Actuals.D_DATE as [Actual_Date], 
--A
Budgets.A as [Budget_A], BudgetSum.A as [SumBudget_A], 
Actuals.A as [Actual_A], ActualSum.A as [SumActual_A],
(case BudgetSum.A when 0 then 0 else (ActualSum.A/BudgetSum.A)end) as [A_Ratio],
--B
Budgets.B as [Budget_B], BudgetSum.B as [SumBudget_B], 
Actuals.B as [Actual_B], ActualSum.B as [SumActual_B],
(case BudgetSum.B when 0 then 0 else (ActualSum.B/BudgetSum.B)end) as [B_Ratio]
FROM Budgets 
inner join Actuals on (Actuals.RowNum = Budgets.RowNum and Actuals.ID = Budgets.ID) 
inner join BudgetSum on (Actuals.RowNum = BudgetSum.RowNum and Actuals.ID = BudgetSum.ID)
inner join ActualSum on (Actuals.RowNum = ActualSum.RowNum and Actuals.ID = ActualSum.ID) 
order by Budgets.ID, Budgets.RowNum

SQL Server 2008 的执行计划：

http://s11.postimg.org/ierhjgvv7/6_18_2013_10_17_26_AM.jpg

score 1 · Accepted Answer

I would suggest that, if you are allowed to do so, you set up some smaller versions of these tables and do some experimenting with adding additional indexes. Maybe 10,000 records per table, with different values for ID and D_DATE so you get some representative data. Perhaps a separate, smaller database could be created that you had free reign in.

What I suspect is that you're going to need some additional indexes. For example, the following code sorts by D_DATE (this is from your Budget CTE):

 SELECT ID, D_Date, A, B,
 ROW_NUMBER() OVER(PARTITION BY ID ORDER BY D_DATE ASC) as 'RowNum'  
 from @DailyBudget 
 where not (A = 0 and B = 0) 
     and D_Date > CONVERT(datetime, '2013-01-01 00:00:00.000', 102)

Try creating a second, non-primary index with the same columns, but in the order D_DATE and ID.

Another thing that's probably costing a lot is that you generate RowNum and then group on it, which requires the query engine to sort all these records in RowNum order. I would try something like this:

 WITH Budgets AS
  (SELECT ID, D_Date, A, B
   from @DailyBudget 
   where not (A = 0 and B = 0) 
   and D_Date > CONVERT(datetime, '2013-01-01 00:00:00.000', 102)
 )
, BudgetSum AS
 (select t1.ID, T1.d_date, SUM(t2.A) as [A], SUM(t2.B) as [B]
  from Budgets as t1
  inner join Budgets as t2 on t1.D_DATE >= t2.D_DATE and t1.ID = t2.ID
 group by t1.ID, T1.D_DATE
)

It's almost the same, but it's taking advantage of the index you already have (the primary key) and not requiring the calculation and then sort by RowNum.

Finally, the technique you're using to get the YTD figures by date is perfectly valid, but since your tables have millions of records you're talking possibly multi-billions of joined records to process. It's not surprising that this takes a long time! Consider using some staging tables to hold subsets of your data rather than processing every record going into your final numbers in one go. Or partition your queries (by date, or by ranges of ID) so that you can run faster queries multiple times and assemble the numbers you want in Excel, or in a set of smaller database tables that you can update with additional data as the tables grow.

Hope some of this helps.

score 1 · Accepted Answer

有 6 个表扫描占用了 18% 的最昂贵查询。这些表扫描都是针对您的表变量@DailyBudget和@DailyActual. 不幸的是，除非它们是创建唯一索引的副作用，否则您无法在表变量上创建索引，但我怀疑这对您没有帮助。

您可以在临时表上创建索引，我建议您尝试将代码转换为使用临时表，创建缺少的索引并查看是否有帮助。创建适当的索引也可能有助于您的排序成本，该成本占 yoru 最昂贵查询的 63%。

sql - 如何加快这个 SQL 查询

2 回答 2

Related

Reference