3

I hope this question provides all of the necessary information, but please do request more if anything is unclear. This is my first question on stack overflow so please bear with me.

I am running this query on SQL Server 2005.

I have a large derived dataset (i'll provide a small subset later) which has 4 fields; ID, Year, StartDate, EndDate

Within this data set the ID may (correctly) appear multiple times with different date combinations.

The question I have is what ways are there to identify if a record is 'new' I.E it's start date does not fall between the start and end date of any other records for the same id.

For an example take the data set below (I hope this table comes out correctly!);

+----+------+------------+------------+
| ID | Year | Start Date |  End Date  |
+----+------+------------+------------+
|  1 | 2007 | 01/01/2007 | 10/10/2007 |
|  1 | 2007 | 01/01/2007 | 05/04/2007 |
|  1 | 2007 | 05/04/2007 | 08/10/2007 |
|  1 | 2007 | 15/10/2007 | 20/10/2007 |
|  1 | 2007 | 25/10/2007 | 01/01/2008 |
|  2 | 2007 | 01/01/2007 | 01/01/2008 |
|  2 | 2008 | 01/01/2008 | 15/07/2008 |
|  2 | 2008 | 10/06/2008 | 01/01/2009 |
+----+------+------------+------------+

If we say nothing existed before 2007 then Row 1 and Row 6 are 'new' at that time.

Rows 2,3,7 and 8 are not 'new' as they either join the end of a previous record or overlap it to form a continuous date period (take rows 6 and 7 there are no 'breaks' between 01/01/2008 and 01/01/2009)

Row 4 and 5 would be considered a new record as it does not attach directly to the end of the previous period for ID 1 or overlap any of the other periods.

Currently to get this data set I have to put all of my data into temporary tables and then join them together on various fields to remove the records I don't want.

Firstly I remove rows where the startdate equals the enddate of another row for that ID (This would get rid of rows 3 and 7)

Then I remove rows where the the start date is between the startdate and enddate of other records for that ID (this would remove rows 2 and 8)

That would leave me withRows 1,4,5 and 6 as the 'new' records which is correct.

Is there a more efficient way to do this such as in some sort of loop, CTE or cough Cursor?

As per the above, if there is anything unclear don't hesitate to ask and I will try and provide you with the information you request.

4

2 回答 2

1

尝试

;with cte as
(
    Select *, row_number() over (partition by id order by startdate) rn from yourtable
)
select distinct t1.* 
from cte t1
     left join cte t2 
     on t1.ID = t2.ID
     and t1.EndDate>=t2.StartDate and t1.StartDate<=t2.EndDate
     and t1.rn<>t2.rn
where t2.ID is null
or t1.rn=1
于 2012-11-26T14:41:21.047 回答
0

如果每一行都有一个唯一标识符,这应该可以工作:

select * from 
tbl t3 
left outer join
(
select distinct t1.id as id_inside, t1.recno as recno_inside
from 
tbl t1 inner join 
tbl t2 on
t1.id = t2.id and
(t1.startdate <> t2.startdate or t1.enddate <> t2.enddate) and
(t1.startdate >= t2.startdate and t1.enddate <= t2.enddate)
 ) t4 on
t3.id = t4.id_inside and
t3.recno = t4.recno_inside
where
id_inside is null and
recno_inside is null

sqlfiddle

于 2012-11-26T16:30:11.730 回答