I came up with a formula to calculate the similarity between two records and it is pretty simple
Similarity = No of attributes matched between two records/ Total No of Attributes *100
For Example:
ID |First Name| Last Name| DOB | Parent Last Name
1 |John |Doe | 03/19/1989| Jonathan
1 |John |Doe | 03/19/1998| Jonathan
We Will get a similarity of ¾*100 = 75% for ID=1.
I want to implement this with the help of a SQL query. I am aware that I can do it with a program but I want to try it with a SQL query.
The following steps will calculate this formula.
For all records belonging to a particular ID compare all the attributes and if they match then 1 or else 0.
Get the sum of all the matches for a particular ID.
Calculate the similarity for a given ID.
Please let me know if you have any questions.
Note: I am using SQL server 2008.