I am creating an import data tool from several vendors. Unfortunately the data is not generated by me, so i have to work with it. I have come across the following situation.
I have a table like the following:
ID |SartDate |Availability
========================================
H1 |20130728 |YYYYYYNNNNQQQQQ
H2 |20130728 |NNNNYYYYYYY
A3 |20130728 |NNQQQQNNNNNNNNYYYYYY
A2 |20130728 |NNNNNYYYYYYNNNNNN
To explain what this data means is: Every letter in the Availability column is the availability flag for a specific date, starting from the date noted in the StartDate column.
- Y : Available
- N : Not Available
- Q : On Request
For instance for ID H1 20130728 - 20130802 is available, then from 20130803 - 20130806 is not available and from 20130807 - 20130811 is available on request.
What i need to do is transform this table to the following setup:
ID |Available |SartDate |EndDate
========================================
H1 |Y |20130728 |20130802
H1 |N |20130803 |20130806
H1 |Q |20130806 |20130811
H2 |N |20130728 |20130731
H2 |Y |20130801 |20130807
A3 |N |20130728 |20130729
A3 |Q |20130730 |20130802
A3 |N |20130803 |20130810
A3 |Y |20130811 |20130816
A2 |Y |20130728 |20130801
A2 |Y |20130802 |20130807
A2 |Y |20130808 |20130813
The initial table has approximately 40,000 rows. The Availability column may have several days (I've seen up to 800).
What i have tried is turn the Availability into rows and then group consecutive days together and then get min and max date for each group. For this i have used three or four CTEs
This works fine for a few IDs, but when i try to apply it to the whole table it take ages (I stopped the initial test run after a fool time sleep and it hadn't finish, and yes i mean i was sleeping while it was running!!!!)
I have estimated that if i turn each character in a single row then i end up with something like 14.5 million rows.
So, i am asking, is there a more efficient way of doing this? (I know there is, but i need you to tell me)
Thanks in advance.