I am using MS SQL 2008R2. I am working with a partial data set (i.e. not all colunms) and have not been able to exclude duplicate rows. For example, here are the rows for a claim:
Claim Line Doctor Received Paid
02195111345 1 22924 1995-11-13 2001-10-02
02195111345 1 22924 1995-11-13 2001-10-02
02195111345 2 22924 1995-11-13 2001-10-02
02195111345 2 22924 1995-11-13 2001-10-02
02195111345 3 22924 1995-11-13 2001-10-02
02195111345 3 22924 1995-11-13 2001-10-02
02195111345 4 22924 1995-11-13 2001-10-02
02195111345 4 22924 1995-11-13 2001-10-02
02195111345 5 22924 1995-11-13 2001-10-02
02195111345 5 22924 1995-11-13 2001-10-02
I want to select only 1 row for each Claim+Line. There are additional columns, but they contain the same information for each combination of Claim+Line.
I can easily identify the rows with duplicates with a query such as:
SELECT [ClaimNum],[ClaimNum_Line]
FROM [dbo].[DamagedClaims]
GROUP BY [ClaimNum],[ClaimNum_Line]
HAVING COUNT(*) > 1
Typically, I would create a derived table using this query and join it back to the table - with another coulmn such as MAX(Version) or such, but there are no such columns with which to differentiate the duplicates.
I have considered
SELECT ClaimNum], Line, MAX(DOCTOR), MAX([Date Received]), MAX([Date Paid])
but there are about 20 columns and 750 million rows and that seemed like a lot of overhead (I am very sympathetic to our SQL Servers!). Is there a better solution?
Best, Scott