1

我有一个带有日期的表格和一个列,它告诉您它是否是“连接”的一系列日期中的第一个日期。例子

╔═══════════╦════════════╦═══════╗
║ person_id ║    DATE    ║ FIRST ║
╠═══════════╬════════════╬═══════╣
║         1 ║ 2013-05-31 ║     1 ║
║         1 ║ 2013-06-01 ║     0 ║
║         1 ║ 2013-06-02 ║     0 ║
║        15 ║ 2013-07-08 ║     1 ║
║        15 ║ 2013-07-09 ║     0 ║
║         1 ║ 2013-07-30 ║     1 ║
║         1 ║ 2013-07-31 ║     0 ║
║         1 ║ 2013-08-01 ║     0 ║
╚═══════════╩════════════╩═══════╝

我需要一个新表,其中包含每个系列的开始日期和结束日期列。例子:

╔═══════════╦════════════╦════════════╗
║ person_id ║ START_DATE ║  END_DATE  ║
╠═══════════╬════════════╬════════════╣
║         1 ║ 2013-05-31 ║ 2013-06-02 ║
║        15 ║ 2013-07-08 ║ 2013-07-09 ║
║         1 ║ 2013-07-30 ║ 2013-08-01 ║
╚═══════════╩════════════╩════════════╝

是否可以不使用while循环?我尝试了类似的while循环,但速度很慢。该表大约有 100 000 条记录。

我尝试的循环如下所示:

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('dbo.temp_table'))
drop table temp_table;
go

SELECT
[person_id],
[date],
[first],
0 AS Processed,
N = ROW_NUMBER() OVER (ORDER BY p_id, datum)
INTO temp_table
FROM [person_dates]
ORDER BY person_id, date
go

declare @N int
declare @N2 int
declare @P_ID int
declare @DATE varchar(10)
declare @DATE2 varchar(10)
declare @start_date datetime
declare @end_date datetime

While (Select Count(*) From temp_table Where Processed = 0 AND first=1) > 0 
Begin 
    Select @N=N,@P_ID=person_id, @DATE=date From temp_table Where Processed = 0 AND first=1 ORDER BY N
    set @start_date = CAST(@DATE as datetime)
    set @DATE2=@DATE
    while (SELECT COUNT(*) FROM temp_table Where Processed = 0 AND first<>1 and 
           CAST(date as datetime) = dateadd(day,1,CAST(@DATE2 as datetime)) and person_id=@P_ID) > 0
    Begin
        Select @N2=N,@DATE2=date From temp_table Where Processed = 0 AND first<>1 and 
           CAST(date as datetime) = dateadd(day,1,CAST(DATE2 as datetime)) and person_id=@P_ID ORDER BY N
        Update temp_table Set Processed = 1 Where N = @N2    
    End
    set @end_date=CAST(@DATE2 as datetime)
    Update temp_table Set Processed = 1 Where N = @N
End
go

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('dbo.temp_table'))
drop table temp_table;
go
4

2 回答 2

1

您可以使用一条 SQL 语句,使用自联接来执行此操作,

Select distinct person_id, s.Date startDate, 
   e.Date endDate
From person_dates s
  Left Join n -- find next first if one exists
     On n.person_id = s.person_id           
        And First = 1
        And n.Date = 
           (Select Min(date) from person_dates 
            Where person_id = s.person_id
               And First = 1
               And date > s.Date)
  Join person_dates e -- find last row before next first
     On e.person_id = s.person_id
        And e.Date =
            (Select Max(date) from person_dates 
             where person_id = s.person_id
                And date > s.Date 
                And date < Coalesce(n.Date, date+1))
Where s.First = 1
于 2013-05-31T12:44:45.547 回答
1

这是一个简单的观察。如果您对“第一”列进行累积总和,那么您将拥有一个定义每个组的列。

在某些数据库中,您可以使用窗口/分析函数进行累积求和。在其他情况下,您需要一个相关的子查询。

select person_id, min(date) as start_date, max(date) as end_date
from (select pd.*,
             (select sum(first)
              from person_dates pd2
              where pd2.person_id = pd.person_id and
                    pd2.date <= pd.date
             ) as cumfirst
      from person_dates pd
     ) pd
group by person_id, cumfirst;

使用 ANSI 标准累积和语法,您可以将其写为:

select person_id, min(date) as start_date, max(date) as end_date
from (select pd.*,
             sum(first) over (partition by person_id order by date) as cumFirst
      from person_dates pd
     ) pd
group by person_id, cumfirst;
于 2013-05-31T12:55:05.153 回答