4

我需要在(例如)Person 和PersonEvents 之间对同一个表进行多次连接。每个人有多个事件(0 个或更多)。我需要创建一个视图,从他们最近的事件中选择每个人的某些列,以及从下一个最近的事件中选择列。

人物资料:

Id    Name
1     Iain
2     Fred
3     Mary
4     Foo
5     Bar

人员事件数据:

PersonId    DateStarted                ReasonForLeaving
1           2011-03-12 00:00:00.000    sick
1           2013-02-12 00:00:00.000    NULL
1           2012-04-12 00:00:00.000    holiday
2           2011-05-12 00:00:00.000    new baby
2           2013-06-12 00:00:00.000    NULL
2           2012-07-12 00:00:00.000    had enough
3           2011-08-12 00:00:00.000    pregnant
3           2013-09-12 00:00:00.000    NULL
4           2012-10-12 00:00:00.000    NULL

输出样本将是:

Id   Name    MemberSince                ReasonForChange
1    Iain    2011-03-12 00:00:00.000    holiday
4    Foo     2012-10-12 00:00:00.000    NULL
...

“旧方式”使用 top 1 join 或 sub-select 语句:

SELECT p.*,
    (
        SELECT TOP 1 DateStarted
        FROM PersonEvents e
        WHERE e.PersonId = p.Id
        ORDER BY DateFoo DESC
    ) As MemberSince
FROM Person p
....

但是,如果您需要来自此 Join 的多个列(例如 Date、Comment 和可能的其他 ID),那么您需要执行多个子选择语句,这很昂贵。

所以问题是:如何使用最近和以前事件的行号从连接中获取多列?

4

2 回答 2

4

我想出的最直接(即可读的 SQL)答案使用 WITH 和 ROW_NUMBER。

首先,进行 ROW_NUMBER 查询,对事件进行排序,并为每个事件赋予该 PersonId 唯一的编号:

SELECT *,
    ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
FROM PersonEvents

结果:

PersonId    DateStarted              ReasonForLeaving    EventOrder
1           2013-02-12 00:00:00.000  NULL                1
1           2012-04-12 00:00:00.000  holiday             2
1           2011-03-12 00:00:00.000  sick                3
2           2013-06-12 00:00:00.000  NULL                1
2           2012-07-12 00:00:00.000  had enough          2
2           2011-05-12 00:00:00.000  new baby            3
3           2013-09-12 00:00:00.000  NULL                1
3           2011-08-12 00:00:00.000  pregnant            2
4           2012-10-12 00:00:00.000  NULL                1

现在,每个人的“第一个”事件(在我的情况下是最近的)包含更改的日期(现实生活中的示例:这是跨多所学校的学生注册历史数据,包含学校 ID 和许多其他闲话)。每个人的“第二次”事件包含之前的事件和离开的原因。要将其添加在一起:

WITH SortedEvents AS (
     SELECT *,
         ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY ReasonForLeaving DESC) AS EventOrder
     FROM PersonEvents
)
SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
FROM Person p
     LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
     LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2

它提供了格式良好的输出:

Id          Name   MemberSince              ReasonForChange
1           Iain   2013-02-12 00:00:00.000  holiday
2           Fred   2013-06-12 00:00:00.000  had enough
3           Mary   2013-09-12 00:00:00.000  pregnant
4           Foo    2012-10-12 00:00:00.000  NULL
5           Bar    NULL                     NULL

实际上,您可以从任何行号中选择多列。现实生活中的例子(同样,学生入学历史)选择:

  1. 从硕士生表:
    • 学生卡
    • 姓名
    • 出生日期等
  2. 从注册历史表中作为“当前注册”
    • 学校编号
    • 各种注册状态信息
    • 开始日期
  3. 从注册历史表中作为“以前的注册”
    • 离开的原因

对于大约 15 万名学生及其各自的历史,这种方法非常有效。

我的测试的完整 SQL:

CREATE TABLE Person
(
     Id INT NOT NULL,
     Name VARCHAR(50)
)
GO
CREATE TABLE PersonEvents
(
     PersonId INT NOT NULL,
     DateStarted DATETIME NOT NULL,
     ReasonForLeaving VARCHAR(50)
)
GO
INSERT INTO Person
     SELECT 1, 'Iain' UNION ALL
     SELECT 2, 'Fred' UNION ALL
     SELECT 3, 'Mary' UNION ALL
     SELECT 4, 'Foo'  UNION ALL
     SELECT 5, 'Bar'
GO
INSERT INTO PersonEvents
     SELECT 1, '20110312', 'sick'       UNION ALL
     SELECT 1, '20130212', NULL         UNION ALL
     SELECT 1, '20120412', 'holiday'    UNION ALL
     SELECT 2, '20110512', 'new baby'   UNION ALL
     SELECT 2, '20130612', NULL         UNION ALL
     SELECT 2, '20120712', 'had enough' UNION ALL
     SELECT 3, '20110812', 'pregnant'   UNION ALL
     SELECT 3, '20130912', NULL         UNION ALL
     SELECT 4, '20121012', NULL
GO

--SELECT *
--FROM Person
--SELECT *
--FROM PersonEvents
--GO
WITH SortedEvents AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
    FROM PersonEvents
)
SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
FROM Person p
    LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
    LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2
GO

SELECT p.*,
    (
        SELECT TOP 1 DateStarted
        FROM PersonEvents pe
        WHERE pe.PersonId = p.Id
        ORDER BY DateStarted DESC
    ) AS MemberSince,
    'unknown' AS ReasonForChange
FROM Person p
GO

DROP TABLE Person
DROP TABLE PersonEvents
GO
于 2013-03-12T01:22:36.413 回答
0

对于最后一个事件和上一个事件日期:

SELECT ID,NAME,NextToMostEventDate,ReasonForLeaving
FROM PersonEvents pe
INNER JOIN(
    SELECT pe1.PersonId,TheMostEventDate,NextToMostEventDate=MAX(pe1.DateStarted)
    FROM PersonEvents pe1
    INNER JOIN(
        SELECT PersonId,TheMostEventDate=MAX(DateStarted)
        FROM PersonEvents
        GROUP BY PersonId 
    ) pe2 
    ON pe2.PersonId=pe1.PersonId
    WHERE DateStarted<TheMostEventDate
    GROUP BY pe1.PersonId,TheMostEventDate
) pe12 ON pe12.PersonId=pe.PersonId
INNER JOIN Person ON Id=pe.PersonId
WHERE pe.DateStarted=TheMostEventDate

最后一个活动日期和上一个活动:

SELECT ID,NAME,TheMostEventDate,ReasonForLeaving
FROM PersonEvents pe
INNER JOIN(
    SELECT pe1.PersonId,TheMostEventDate,NextToMostEventDate=MAX(pe1.DateStarted)
    FROM PersonEvents pe1
    INNER JOIN(
        SELECT PersonId,TheMostEventDate=MAX(DateStarted)
        FROM PersonEvents
        GROUP BY PersonId 
    ) pe2 
    ON pe2.PersonId=pe1.PersonId
    WHERE DateStarted<TheMostEventDate
    GROUP BY pe1.PersonId,TheMostEventDate
) pe12 ON pe12.PersonId=pe.PersonId
INNER JOIN Person ON Id=pe.PersonId
WHERE pe.DateStarted=NextToMostEventDate
于 2013-03-12T03:23:21.297 回答