5

我认为这是一个很常见的问题,但我不知道这个过程叫什么,所以我会用一个例子来描述它。这个概念是我想将一个稀疏数据集加入一个完整的系列,例如一周中的几天、一年中的几个月或任何有序集(例如,用于排名)。稀疏数据中的空位置将与完整系列一起显示为 NULL。

假设我在 SQL Server 中运行以下查询以了解月销售额。

SELECT
    YEAR([timestamp]),
    MONTH([timestamp]),
    COUNT(*)
FROM table1
WHERE YEAR([timestamp]) = YEAR(GETDATE())
GROUP BY
    YEAR([timestamp]),
    MONTH([timestamp])
ORDER BY
    YEAR([timestamp]) DESC,
    MONTH([timestamp]) DESC;

但是,如果仅在今年 5 月和 8 月发生销售,则返回结果将如下所示:

2010    August    1234
2010    May       5678

我希望我的返回结果集如下所示:

2010    January
2010    February
2010    March
2010    April
2010    May        1234
2010    June
2010    July
2010    August     5678
2010    September
2010    October
2010    November
2010    December

我知道这样做的唯一方法是:

SELECT
    YEAR(GETDATE()),
    month_index.month_name,
    sales_data.sales
FROM (
    SELECT 'January' as month_name, 1 as month_number
    UNION
    SELECT 'February', 2
    UNION
    SELECT 'March', 3
    UNION
    SELECT 'April', 4
    UNION
    SELECT 'May', 5
    UNION
    SELECT 'June', 6
    UNION
    SELECT 'July', 7
    UNION
    SELECT 'August', 8
    UNION
    SELECT 'September', 9
    UNION
    SELECT 'October', 10
    UNION
    SELECT 'November', 11
    UNION
    SELECT 'December', 12
) as month_index
LEFT JOIN (
    SELECT
        YEAR([timestamp]) AS year_name,
        MONTH([timestamp]) AS month_name,
        COUNT(*) AS sales
    FROM table1
    WHERE YEAR([timestamp]) = GETDATE()
    GROUP BY
        YEAR([timestamp]),
        MONTH([timestamp])
) AS sales_data
ON month_index.month_name = sales_data.month_name
ORDER BY
    month_index.month_number DESC;

有没有更好的方法来创建完整的日期和字母数字系列来加入数据?这叫什么?

谢谢!

4

5 回答 5

9

尝试这样的事情:

DECLARE @StartDate datetime
       ,@EndDate datetime
SELECT @StartDate=DATEADD(month,-6,DATEADD(month,DATEDIFF(month,0,GETDATE()),0) )
      ,@EndDate=GETDATE()

;with AllDates AS
(
    SELECT @StartDate AS DateOf
    UNION ALL
    SELECT DateAdd(month,1,DateOf)
        FROM AllDates
    WHERE DateOf<@EndDate
)
SELECT * FROM AllDates

输出:

DateOf
-----------------------
2009-12-01 00:00:00.000
2010-01-01 00:00:00.000
2010-02-01 00:00:00.000
2010-03-01 00:00:00.000
2010-04-01 00:00:00.000
2010-05-01 00:00:00.000
2010-06-01 00:00:00.000
2010-07-01 00:00:00.000

(8 row(s) affected)
于 2010-06-25T18:08:50.520 回答
4

像这样的查询是许多有经验的 DBA 或数据库程序员在他们的数据库中保留日历表的主要原因之一。

于 2010-06-25T17:57:34.907 回答
3

我喜欢这种构建月份表的方法:

SELECT 
  DATENAME(mm, date_val) AS month_name,  
  MONTH(date_val) AS month_number,  
  date_val as dt
FROM ( 
  SELECT DATEADD(mm, number, '2010-01-01') AS date_val
  FROM master.dbo.spt_values
  WHERE type = 'P'
  AND number BETWEEN 0 AND 11
) months

根据我的测试,它比 CTE 更快。我正在运行 SQL Server 2008 Express。

这是测试结果,使用 SET STATISTICS IO ON 和 SET STATISTICS TIME ON

热电偶:

(12 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 64 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

子查询:

(12 row(s) affected)
Table 'spt_values'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

尽管您最初的问题是问这叫什么。我不知道它的名字;也许像“左外连接反对系列?”

要添加的另一部分:当您加入月份表时,或者甚至在执行原始查询时,通常建议避免在 WHERE 子句的左侧使用 YEAR([timestamp]) 之类的函数。

所以这段代码:

SELECT                     
    YEAR([timestamp]),                     
    MONTH([timestamp]),                     
    COUNT(*)                     
FROM table1                     
WHERE YEAR([timestamp]) = YEAR(GETDATE())                     
GROUP BY                     
    YEAR([timestamp]),                     
    MONTH([timestamp])

...将导致索引扫描(假设时间戳被索引),因为必须为每一行评估 YEAR([timestamp])。在 1m+ 行的表上,这将意味着性能不佳。

因此,您通常会看到这样的建议:

SELECT                     
    YEAR([timestamp]),                     
    MONTH([timestamp]),                     
    COUNT(*)                     
FROM #table1                     
WHERE [timestamp] >= DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0) -- First day of this year
AND   [timestamp] < DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) + 1, 0) -- First day of next year
GROUP BY                     
    YEAR([timestamp]),                     
    MONTH([timestamp])

这将使用索引查找(同样,假设时间戳是一个索引列)并导致更少的逻辑读取,从而更快的响应。这可以通过检查执行计划来确认。

于 2010-06-25T20:28:17.933 回答
3

我使用 KM,对于 SQL Server 2005+,您可以使用递归 CTE:

WITH months AS (
  SELECT DATENAME(mm, '2010-01-01') AS month_name, 
         MONTH('2010-01-01') AS month_number, 
         CAST('2010-01-01' AS DATETIME) AS dt
  UNION ALL
  SELECT DATENAME(mm, DATEADD(mm, 1, m.dt)),
         MONTH(DATEADD(mm, 1, m.dt)),
         DATEADD(mm, 1, m.dt)
    FROM months m
   WHERE DATEADD(mm, 1, m.dt) <= '2010-12-01')
   SELECT x.month_name,
          y.*
     FROM months x
LEFT JOIN your_table y ON MONTH(y.date) = x.month_number

毕竟,上次 KM & 讨论过这个问题——我们发现递归 CTE 比使用数字表更有效

于 2010-06-25T18:13:46.680 回答
1

如何创建一个名为 Months 的新表:然后用可以加入的数据填充它?

于 2010-06-25T17:51:45.550 回答