1

我有一份根据我们的业务规则显示重复帐户列表的报告。当一个新帐户与其他现有帐户匹配时,此方法有效。当多个新帐户匹配相同的现有副本时,我遇到了麻烦。下面是一个按 NewId 分组的示例:

NewID   MatchedID   FirstName   LastName    AddDate      Address        PhoneNumber
10      10          Holly       Johnson     4/18/2013    123 1St Rd.    123 456 7890
10      2           Hollie      Johnson     1/1/1990     123 1St Rd.    123 456 7890

11      11          Holley      Johnson     4/17/2013    123 1St Rd.    123-456-7890
11      2           Hollie      Johnson     1/1/1990     123 First Rd.  123 456 7890

50      50          William     Johnson     4/17/2013    999 2nd St.    222 222 2222
50      3           Bill        Jonson      1/2/1990     999 Second St. 222-222-2222

包含匹配的帐户本身以进行比较。

那么,有没有办法将这些相似的帐户组合在一起而不会重复?它应该如下所示:

GroupID  AcctID   FirstName   LastName    AddDate      Address        PhoneNumber
1        2        Hollie      Johnson     1/1/1990     123 First Rd.  123 456 7890
1        10       Holly       Johnson     4/18/2013    123 1St Rd.    123 456 7890
1        11       Holley      Johnson     4/17/2013    123 1St Rd.    123-456-7890
2        50       William     Johnson     4/17/2013    999 2nd St.    222 222 2222
2        3        Bill        Jonson      1/2/1990     999 Second St. 222-222-2222

我不在乎分组是在 SQL 中还是在 SSRS 中完成的。它需要引用两个 ID 列,因为名称、地址和电话号码可能不同。我还需要分配一个新的 GroupID,以便它们可以在报告中分组。

4

1 回答 1

1

您可以使用排名函数来消除行:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
)
select   NewID
  , MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber
from NoDuplicates where rownum = 1

SQL Fiddle 与演示

虽然没有理由你不能只使用GROUP BY假设地址信息也总是重复:

select NewID = min(NewID)
  , MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber
from Accounts
group by MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber

SQL Fiddle 与演示

这两个都返回您的预期结果。

评论后编辑:

您可以使用如下语句对相关行进行分组:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
  where NewID <> MatchedID
)
select groupID = MatchedID
  , Acct = MatchedID
  , FirstName
  , AddDate
  , Address
  , phoneNumber
from NoDuplicates where rownum = 1
union all
select groupID = coalesce(am.MatchedID, a.NewID)
  , Acct = a.MatchedID
  , a.FirstName
  , a.AddDate
  , a.Address
  , a.phoneNumber
from Accounts a
  -- join to the corresponding matched account
  left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
order by groupID, Acct

SQL Fiddle 与演示

但是,这基本上只是按 分组MatchedID。如果您想要从 1 开始编号的组,您可以DENSE_RANK在语句中添加一个子句:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
  where NewID <> MatchedID
)
, GroupedAcct as
(
  select GroupID = MatchedID
    , Acct = MatchedID
    , FirstName
    , AddDate
    , Address
    , phoneNumber
  from NoDuplicates where rownum = 1
  union all
  select GroupID = coalesce(am.MatchedID, a.NewID)
    , Acct = a.MatchedID
    , a.FirstName
    , a.AddDate
    , a.Address
    , a.phoneNumber
  from Accounts a
    -- join to the corresponding matched account
    left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
  where a.NewID = a.MatchedID
)
select GroupID = Dense_Rank() over (order by GroupID)
  , Acct
  , FirstName
  , AddDate
  , Address
  , phoneNumber
from GroupedAcct
order by groupID, Acct

SQL Fiddle 与演示

于 2013-04-19T16:39:41.163 回答