I hope this question provides all of the necessary information, but please do request more if anything is unclear. This is my first question on stack overflow so please bear with me.
I am running this query on SQL Server 2005.
I have a large derived dataset (i'll provide a small subset later) which has 4 fields; ID, Year, StartDate, EndDate
Within this data set the ID may (correctly) appear multiple times with different date combinations.
The question I have is what ways are there to identify if a record is 'new' I.E it's start date does not fall between the start and end date of any other records for the same id.
For an example take the data set below (I hope this table comes out correctly!);
+----+------+------------+------------+ | ID | Year | Start Date | End Date | +----+------+------------+------------+ | 1 | 2007 | 01/01/2007 | 10/10/2007 | | 1 | 2007 | 01/01/2007 | 05/04/2007 | | 1 | 2007 | 05/04/2007 | 08/10/2007 | | 1 | 2007 | 15/10/2007 | 20/10/2007 | | 1 | 2007 | 25/10/2007 | 01/01/2008 | | 2 | 2007 | 01/01/2007 | 01/01/2008 | | 2 | 2008 | 01/01/2008 | 15/07/2008 | | 2 | 2008 | 10/06/2008 | 01/01/2009 | +----+------+------------+------------+
If we say nothing existed before 2007 then Row 1 and Row 6 are 'new' at that time.
Rows 2,3,7 and 8 are not 'new' as they either join the end of a previous record or overlap it to form a continuous date period (take rows 6 and 7 there are no 'breaks' between 01/01/2008 and 01/01/2009)
Row 4 and 5 would be considered a new record as it does not attach directly to the end of the previous period for ID 1 or overlap any of the other periods.
Currently to get this data set I have to put all of my data into temporary tables and then join them together on various fields to remove the records I don't want.
Firstly I remove rows where the startdate equals the enddate of another row for that ID (This would get rid of rows 3 and 7)
Then I remove rows where the the start date is between the startdate and enddate of other records for that ID (this would remove rows 2 and 8)
That would leave me withRows 1,4,5 and 6 as the 'new' records which is correct.
Is there a more efficient way to do this such as in some sort of loop, CTE or cough Cursor?
As per the above, if there is anything unclear don't hesitate to ask and I will try and provide you with the information you request.