2

我有一个包含以下数据的表:

User#       App
1       A
1       B
2       A   
2       B
3       A

我想知道不同用户的应用程序之间的重叠,所以我的最终结果看起来像这样

App1  App2  DistinctUseroverlapped 
A     A     3
A     B     2
B     B     2

所以结果意味着有 3 个用户只使用应用程序 A,有 2 个用户同时使用应用程序 A 和应用程序 B,并且有 2 个用户只使用应用程序 B。

还记得有很多应用程序和用户我怎么能在 SQL 中做到这一点?

4

2 回答 2

2

我的解决方案首先生成所有可能的感兴趣的应用程序对。这是driver子查询。

然后它加入每个应用程序的原始数据。

最后,它用于count(distinct)计算两个列表之间匹配的不同用户。

select pairs.app1, pairs.app2,
       COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2
group by pairs.app1, pairs.app2

您可以将条件比较中的条件比较移动count到连接中,然后使用count(distinct)

select pairs.app1, pairs.app2,
       COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2 and
        tright.user = tleft.user
group by pairs.app1, pairs.app2

我更喜欢第一种方法,因为它更明确地说明了要计算的内容。

这是标准 SQL,因此它应该适用于 Vertica。

于 2013-04-18T21:41:31.327 回答
0

这适用于vertica 6

 with tab as 
    ( select 1 as user,'A' as App
    union  select 1 as user,'B' as App
    union select 2 as user,'A' as App
    union select 2 as user,'B' as App
    union select 3 as user,'A' as App
    )
    , apps as 
    ( select distinct App  from tab )
    select apps.app as APP1,tab.app as APP2 ,count(distinct tab.user) from tab,apps
    where tab.app>=apps.app
    group by 1,2
    order by 1
于 2013-08-14T19:53:33.413 回答