sql-server - 选择范围内每个月的前 5 个 SUM（每个客户一个）

Question

我有一个查询，可以为客户提取月/年总计，并添加 ntile 排名。如果我能够提取 ntile 1、2、3、4 和 5 的最大小计，我几乎会得到我想要的，但我不知道如何继续。

例如，我想要的结果看起来像：

Month   Year   CustomerCode   SubTotal   ntile
1       2012   CCC            131.45     1
1       2012   CCC            342.95     2
1       2012   ELITE          643.92     3
1       2012   CCC            1454.05    4
1       2012   CCC            12971.78   5
2       2012   CCC            135.99     1
2       2012   CCI            370.47     2
2       2012   NOC            766.84     3
2       2012   ELITE          1428.26    4
2       2012   VBC            5073.20    5
3       2012   CCC            119.02     1
3       2012   CCC            323.78     2
3       2012   HUCC           759.66     3
3       2012   ELITE          1402.95    4
3       2012   CCC            7964.20    5

除了 - 我希望排名会像第 2 个月那样不同的客户，但我的基本查询并没有给我这个结果 - 我显然不知道如何在 SQL SERVER 2005 上的 T-SQL 中得到它 - 事实上我'我不确定我得到了什么。

我的下一个选择是在 C# 中拉出一个 DataTable 并做一些体操来到达那里，但必须有一个更简单的方法:)

我的基本查询是

SELECT 
i.DateOrdered
,LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]   
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]   
,c.CustomerCode 
,SUM(i.Jobprice) AS Subtotal  
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i 
JOIN 
Customers c 
ON i.CustomerID = c.ID 
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012' 
GROUP BY YEAR(i.DateOrdered),  MONTH(i.DateOrdered), i.DateOrdered, c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),   
TRIM(STR(YEAR(i.Dateordered))),     
SUM(i.JobPrice), c.CustomerCode ASC

我真的很感激帮助把这件事做好。

提前致谢

悬崖

score 3 · Accepted Answer

If I read you correctly, what you are after is

For each month in the range,
Show 5 customers who have the greatest SUMs in that month
And against each customer, show the corresponding SUM.

In that case, this SQL Fiddle creates a sample table and runs the query that gives you the output described above. If you wanted to see what's in the created tables, just do simple SELECTs on the right panel.

The query is:

;     WITH G as -- grouped by month and customer
(
    SELECT DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) [Month],
           c.CustomerCode,
           SUM(i.Jobprice) Subtotal
      FROM Invoices i
      JOIN Customers c ON i.CustomerID = c.ID
     WHERE i.DateOrdered >= '1/1/2012' AND i.DateOrdered <= '9/30/2012'
  GROUP BY DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered), c.CustomerCode
)
    SELECT MONTH([Month]) [Month],
           YEAR([Month]) [Year],
           CustomerCode,
           SubTotal,
           Rnk [Rank]
      FROM
(
    SELECT *, RANK() OVER (partition by [Month] order by Subtotal desc) Rnk
      FROM G
) X
     WHERE Rnk <= 5
  ORDER BY Month, Rnk

To explain, the first part (WITH block) is just a fancy way of writing a subquery, that GROUPs the data by month and Customer. The expression DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) turns every date into the FIRST day of that month, so that the data can be easily grouped by month. The next subquery written in traditional form adds a RANK column within each month by the subtotal, which is finally SELECTed to give the top 5*.

Note that RANK allows for equal rankings, which may end up showing 6 customers for a month, if 3 of them are ranked equally at position 4. If that is not what you want, then you can change the word RANK to ROW_NUMBER which will randomly tie-break between equal Subtotals.

score 0 · Accepted Answer

Try this:

declare @tab table
(
[month] int,
[year] int,
CustomerCode varchar(20),
SubTotal float 
)
insert into @tab
select
1,2012,'ccc',131.45 union all 
select
1,2012,'ccc',343.45 union all
select 
1,2012,'ELITE',643.92 union all
select 
2,2012,'ccc',131.45 union all 
select
2,2012,'ccc',343.45 union all
select 
2,2012,'ELITE',643.92 union all
select 
3,2012,'ccc',131.45 union all 
select
3,2012,'ccc',343.45 union all
select 
3,2012,'ELITE',643.92

;with cte as 
(
 select NTILE(3) OVER(partition by [month] ORDER BY [month]) AS [ntile],* from @tab
)
select * from cte

Even in your base query you need to add partition by, so that you will get correct output.

score 0 · Accepted Answer

The query needs to be modified to only get the month and year dateparts. The issue you are having with the same customer showing multiple times in the same month is due to the inclusion of i.DateOrdered in the select and group by clauses.

The following query should give you what you need. Also, I suspect it is a typo on the next to last line of the query, but tsql doesn't have a TRIM() function only LTRIM and RTRIM.

SELECT 

LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]   
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]   
,c.CustomerCode 
,SUM(i.Jobprice) AS Subtotal  
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i 
JOIN 
Customers c 
ON i.CustomerID = c.ID 
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012' 
GROUP BY YEAR(i.DateOrdered),  MONTH(i.DateOrdered),  c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),   
LTRIM(STR(YEAR(i.Dateordered))),     
SUM(i.JobPrice), c.CustomerCode ASC

This gives these results

Month   Year    CustomerCode    Subtotal    ntile
 1      2012    ELITE            643.92      2
 1      2012    CCC            14900.23      5
 2      2012    CCC              135.99      1
 2      2012    CCI              370.47      1
 2      2012    NOC              766.84      3
 2      2012    ELITE           1428.26      4
 2      2012    VBC             5073.20      4
 3      2012    HUCC             759.66      2
 3      2012    ELITE           1402.95      3
 3      2012    CCC             8407.00      5

score 0 · Accepted Answer

I can't see how to solve this problem without double ranking:

You need to get the largest sums per customer & month.
You then need, for every month, to retrieve the top five of the found sums.

Here's how I would approach this:

;
WITH MaxSubtotals AS (
  SELECT DISTINCT
    CustomerID,
    MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
    Subtotal  = MAX(SUM(JobPrice)) OVER (
      PARTITION BY Customer, DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0)
      ORDER BY SUM(JobPrice)
    )
  FROM Invoices
  GROUP BY
    CustomerID,
    DateOrdered
),
TotalsRanked AS (
  SELECT
    CustomerID,
    MonthDate,
    Subtotal,
    Ranking = ROW_NUMBER() OVER (PARTITION BY MonthDate ORDER BY Subtotal DESC)
  FROM MaxDailyTotals
)
SELECT
  Month = MONTH(i.MonthDate),
  Year  = YEAR(i.MonthDate),
  c.CustomerCode,
  i.Subtotal,
  i.Ranking
FROM TotalsRanked i
  INNER JOIN Customers ON i.CustomerID = c.ID
WHERE i.Ranking <= 5
;

The first CTE, MaxSubtotals, determines the maximum subtotals per customer & month. Involving DISTINCT and a window aggregating function, it is essentially a "shortcut" for the following two-step query:

SELECT
  CustomerID,
  MonthDate,
  Subtotal = MAX(Subtotal)
FROM (
  SELECT
    CustomerID,
    MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
    Subtotal = SUM(JobPrice)
  FROM Invoices
  GROUP BY
    CustomerID,
    DateOrdered
) s
GROUP BY
  CustomerID,
  MonthDate

The other CTE, TotalsRanked, simply adds ranking numbers for the found susbtotals, partitioning by customer and month. As a final step, you only need to limit the rows to those that have rankings not greater than 5 (or whatever you might choose another time).

Note that using ROW_NUMBER() to rank the rows in this case guarantees that you'll get no more than 5 rows with the Ranking <= 5 filter. If there were two or more rows with the same subtotal, the would get distinct rankings, and in the end you might end up with an output like this:

   Month  Year  CustomerCode  Subtotal  Ranking
   -----  ----  ------------  --------  -------
   1      2012  CCC           1500.00   1
   1      2012  ELITE         1400.00   2
   1      2012  NOC           900.00    3
   1      2012  VBC           700.00    4
   1      2012  HUCC          700.00    5

-- 1      2012  ABC           690.00    6   -- not returned
-- 1      2012  ...           ...       ...

Even though there might be other customers with Subtotals of 700.00 for the same month, they wouldn't be returned, because they would be assigned rankings after 5.

You could use RANK() instead of ROW_NUMBER() to account for that. But note that you might end up with more than 5 rows per month then, with an output like this:

   Month  Year  CustomerCode  Subtotal  Ranking
   -----  ----  ------------  --------  -------
   1      2012  CCC           1500.00   1
   1      2012  ELITE         1400.00   2
   1      2012  NOC           900.00    3
   1      2012  VBC           700.00    4
   1      2012  HUCC          700.00    4
   1      2012  ABC           700.00    4

-- 1      2012  DEF           690.00    7   -- not returned
-- 1      2012  ...           ...       ...

Customers with subtotals less than 700.00 wouldn't make it to the output because they would have rankings starting with 7, which would correspond to the ranking of the first under-700.00 sum if ranked by ROW_NUMBER().

And there's another option, DENSE_RANK(). You might want to use it if you want up to 5 distinct sums per month in your output. With DENSE_RANK() your output might contain even more rows per month than it would have with RANK(), but the number of distinct subtotals would be exactly 5 (or fewer if the original dataset can't provide you with 5). That is, your output might then look like this:

   Month  Year  CustomerCode  Subtotal  Ranking
   -----  ----  ------------  --------  -------
   1      2012  CCC           1500.00   1
   1      2012  ELITE         1400.00   2
   1      2012  NOC           900.00    3
   1      2012  VBC           700.00    4
   1      2012  HUCC          700.00    4
   1      2012  ABC           700.00    4
   1      2012  DEF           650.00    5
   1      2012  GHI           650.00    5
   1      2012  JKL           650.00    5

-- 1      2012  MNO           600.00    5   -- not returned
-- 1      2012  ...           ...       ...

Like RANK(), the DENSE_RANK() function assigns same rankings to identical values, but, unlike RANK(), it doesn't produce gaps in the ranking sequence.

References:

sql-server - 选择范围内每个月的前 5 个 SUM（每个客户一个）

4 回答 4

Related

Reference