1

I came up with a formula to calculate the similarity between two records and it is pretty simple

Similarity = No of attributes matched between two records/ Total No of Attributes *100

For Example:

ID |First Name| Last Name| DOB       | Parent Last Name
1  |John      |Doe       | 03/19/1989| Jonathan
1  |John      |Doe       | 03/19/1998| Jonathan

We Will get a similarity of ¾*100 = 75% for ID=1.

I want to implement this with the help of a SQL query. I am aware that I can do it with a program but I want to try it with a SQL query.

The following steps will calculate this formula.

For all records belonging to a particular ID compare all the attributes and if they match then 1 or else 0.

Get the sum of all the matches for a particular ID.

Calculate the similarity for a given ID.

Please let me know if you have any questions.

Note: I am using SQL server 2008.

4

1 回答 1

3

我明白了,您正在尝试对记录进行“ID 内”相似性。这是一种方法:

select id,
       ((case when min(FirstName) = max(FirstName) then 1.0 else 0 end) +
        (case when min(LastName) = max(LastName) then 1.0 else 0 end) +
        (case when min(DOB) = max(DOB) then 1.0 else 0 end) +
        (case when min(ParentLastName) = max(ParentLastName) then 1.0 else 0 end)
       ) / 4.0 as similarity
from t
group by id;

这会忽略 NULL 值。

于 2013-07-01T14:53:20.847 回答