2

这是sqlfiddle上的问题

我有几张表完全外联在一起。在这个问题中,我们简化为只有 2 个表格。FULL JOINS 的原因是生产表有许多不一致的字段,例如 dates1 可能包含 Revenue 和 Compensation,而 dates2 可能包含 NumHeadBangers 和 NumNormalBods;所以在以下之间做 UNION ALL 是行不通的:

create table dates1 
(
USERID INT,
[Date] datetime
)
insert into dates1
values
( 1, '01 jan 2012'),
( 2, '03 jan 2012')

create table dates2 
(
USERID INT,
[Date] datetime
)
insert into dates2
values
( 2, '01 jan 2012'),
( 4, '04 jan 2012')

对于每个 USERID,我们需要找到最短日期。这是尝试,我使用了 COALESCE,因为在生产脚本中可能有 4 或 5 个表加入:

SELECT 
  COALESCE(x.USERID,y.USERID) USERID
  , CASE WHEN x.[Date] < Y.[DATE] 
        THEN x.[Date] 
        ELSE Y.[DATE] END [DATE]
FROM 
dates1 x 
FULL OUTER JOIN dates2 y 
    ON x.USERID = y.USERID

以上返回以下内容,这对于用户 1 来说是错误的,因为我们要求用户 1 的最小日期为 2012 年 1 月 1 日。此外,一旦我们处理 4 个表,上述 CASE 语句就会变得非常混乱。

查找这些日期的可扩展脚本是什么?

我一直在使用的一个混乱的解决方案是:

SELECT 
  COALESCE(x.USERID,y.USERID) USERID
  , CASE 
      WHEN ISNULL(x.[Date],'1 JAN 2020') < ISNULL(Y.[DATE],'1 JAN 2020') 
      THEN ISNULL(x.[Date],'1 JAN 2020') 
      ELSE ISNULL(Y.[DATE],'1 JAN 2020') 
  END [DATE]
FROM 
  dates1 x 
  FULL OUTER JOIN dates2 y 
     ON x.USERID = y.USERID

在此处输入图像描述

4

3 回答 3

1

您需要处理比较导致错误的情况,因为一侧为 NULL:

CASE WHEN x.[Date] < Y.[DATE] OR Y.[DATE] IS NULL

您也可以尝试一些更简单的方法:

SELECT userid, MIN(date) FROM
(SELECT userid, date FROM dates1
 UNION ALL SELECT userid, date FROM dates2
 -- ...
) AS x
GROUP BY userid
于 2012-06-18T16:21:30.803 回答
1

在这种情况下我CROSS APPLY用来减少(但不是消除)代码重复的方式(最小的最小的最小等)如下......

CREATE FUNCTION min_datetime (datetime1 AS DATETIME, datetime2 AS DATETIME)
RETURNS TABLE
AS
RETURN
  SELECT CASE WHEN datetime1 < datetime2 THEN datetime1
              WHEN datetime1 > datetime2 THEN datetime2
              WHEN datetime1 IS NULL     THEN datetime2
                                         ELSE datetime1
         END AS val
GO;

SELECT
  COALESCE(a.id, b.id, c.id, d.id, e.id)                    as id,
  [min_datetime_d_e].val                                    as date,
  a.fields,  b.fields,  c.fields,  d.fields,  e.fields
FROM
                  a
  FULL OUTER JOIN b ON a.id = b.id
  FULL OUTER JOIN c ON b.id = COALESCE(a.id, b.id)
  FULL OUTER JOIN d ON c.id = COALESCE(a.id, b.id, c.id)
  FULL OUTER JOIN e ON d.id = COALESCE(a.id, b.id, c.id, d.id)
  CROSS APPLY dbo.min_datetime(a.date,               b.date) AS min_datetime_a_b
  CROSS APPLY dbo.min_datetime(min_datetime_a_b.val, c.date) AS min_datetime_b_c
  CROSS APPLY dbo.min_datetime(min_datetime_b_c.val, d.date) AS min_datetime_c_d
  CROSS APPLY dbo.min_datetime(min_datetime_c_d.val, e.date) AS min_datetime_d_e

编辑:对 OP 发布的答案进行轻微重构。

;WITH myCTE (UserID, [Date])
AS
  (
    SELECT UserID,[Date]FROM table1
    UNION ALL
    SELECT UserID,[Date]FROM table2
    UNION ALL
    SELECT UserID,[Date]FROM table3
  )
      , unique_by_user (UserID, [Date])
       (
              SELECT UserID, MIN([Date]) FROM myCTE GROUP BY UserID
       )
SELECT  
    u.UserID, u.[Date]
  , x.field1, x.field2
  , y.field3, y.field4
  , z.field5, z.field6
FROM
       unique_by_user u
  LEFT OUTER JOIN table2 x  
      ON u.USERID = x.USERID 
  LEFT OUTER JOIN table3 z  
      ON u.USERID = y.USERID 
  LEFT OUTER JOIN myCTE k  
      ON u.USERID = z.USERID 

比较上述两个选项的性能会很有趣。最初我认为处理数据两次的成本(一次在 CTE 中,然后在 OUTER JOIN 中再次加入所有记录)会使情况变得更糟。但我现在只是不确定,我很想测试和比较,但我今天没有时间 :)

于 2012-06-19T08:24:10.883 回答
0

SQLfiddle中的实时副本

这个问题在此过程中发生了一些变化,这并不理想,但这就是我最终得到的结果:

    create table table1 
    (
      UserID int,
      [Date] datetime,
      [field1] int,
      [field2] int
    )
    insert into table1
    values
    ( 1,'01 jan 2012',10,10),
    ( 2,'03 jan 2012',20,20)

    create table table2 
    (
      UserID int,
      [Date] datetime,
      [field3] int,
      [field4] int
    )
    insert into table2
    values
    ( 2,'01 jan 2012',30,30),
    ( 4,'04 jan 2012',40,40)


    create table table3 
    (
      UserID int,
      [Date] datetime,
      [field5] int,
      [field6] int
    )
    insert into table3
    values
    ( 2,'01 jan 2012',30,30),
    ( 4,'04 jan 2012',40,40)

这方面的 SQL - 这实际上是 Aaron 提出的想法,但略有不同,因为它使用 cte 馈入完整的外部连接:

;WITH myCTE (UserID, [Date])
AS
  (
    SELECT UserID,[Date]FROM table1 GROUP BY UserID,[Date] 
    UNION
    SELECT UserID,[Date]FROM table2 GROUP BY UserID,[Date] 
    UNION
    SELECT UserID,[Date]FROM table3 GROUP BY UserID,[Date]     
  )
      , myExtraCTE(UserID, [Date])

     AS
     (
     SELECT UserID, [Date] = MAX(Date) FROM myCTE GROUP BY UserID 
     ) 
SELECT  
  COALESCE(x.UserID,y.UserID, z.UserID ,k.UserID) USERID
  , MIN(k.[Date]) [Date]
  , SUM(ISNULL(x.field1,0.0)) field1
  , SUM(ISNULL(x.field2,0.0)) field2
  , SUM(ISNULL(y.field3,0.0)) field3
  , SUM(ISNULL(y.field4,0.0)) field4
  , SUM(ISNULL(z.field5,0.0)) field5
  , SUM(ISNULL(z.field6,0.0)) field6
FROM  
  table1 x  
FROM  
  table1 x  
  FULL OUTER JOIN table2 y  
      ON y.USERID  = x.USERID
  FULL OUTER JOIN table3 z  
      ON z.USERID  = coalesce(x.USERID,y.USERID)
  FULL OUTER JOIN myExtraCTE k  
      ON k.USERID  = coalesce(x.USERID,y.USERID,z.USERID)
GROUP BY
  COALESCE(x.UserID,y.UserID, z.UserID ,k.UserID)
于 2012-06-19T16:48:38.897 回答