我的解决方案首先生成所有可能的感兴趣的应用程序对。这是driver
子查询。
然后它加入每个应用程序的原始数据。
最后,它用于count(distinct)
计算两个列表之间匹配的不同用户。
select pairs.app1, pairs.app2,
COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2
group by pairs.app1, pairs.app2
您可以将条件比较中的条件比较移动count
到连接中,然后使用count(distinct)
:
select pairs.app1, pairs.app2,
COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2 and
tright.user = tleft.user
group by pairs.app1, pairs.app2
我更喜欢第一种方法,因为它更明确地说明了要计算的内容。
这是标准 SQL,因此它应该适用于 Vertica。