5

我试图在这个简化的示例中找到喜欢同一组电视节目的用户对

假设我有一个表格,每个用户都可以在其中获得他们喜欢的每个电视节目的条目:

|USER | Show        |
|-----|-------------|
|001  | Lost        |
|001  | South Park  |
|002  | Lost        |
|003  | Lost        |
|003  | South Park  |
|004  | South Park  |
|005  | Lost        |
|006  | Lost        |

然后我想要一个结果:

|USER1 |USER2 |
|------|------|
|001   |003   |
|003   |001   |
|002   |005   |
|002   |006   |
|005   |002   |
|005   |006   |
|006   |002   |
|006   |005   |

或者更好的版本是:

|USER1 |USER2 |
|------|------|
|001   |003   |
|002   |005   |
|002   |006   |
|005   |006   |

这基本上是说:用户 1 喜欢与用户 3 相同的一组节目。

我一直在玩 GROUP BY 和 JOIN,但我仍然找不到答案:(。

到目前为止,我发现使用

SELECT s1.User as USER1, s2.User as USER2, s1.Show as Show 
FROM Shows s1 JOIN (SELECT * FROM Shows) s2 
ON s1.Shows=s2.Shows AND s1.User!=s2.User;

这产生了成对的用户和他们共同的节目。但我不知道从这里去哪里。

4

2 回答 2

4

如果您可以接受 CSV 而不是表格结果,您可以简单地将表格分组两次:

SELECT GROUP_CONCAT(User) FROM (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) t GROUP BY s

否则,您可以将上述子查询加入到自身中:

SELECT DISTINCT LEAST(t.User, u.User) AS User1,
             GREATEST(t.User, u.User) AS User2
FROM (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) t JOIN (
  SELECT   User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
  FROM     Shows
  GROUP BY User
) u USING (s)
WHERE t.User <> u.User

在sqlfiddle上查看它们。

当然,如果保证表中不存在重复(User, Show)对,则可以通过从聚合中删除关键字来提高性能。ShowsDISTINCTGROUP_CONCAT()

于 2012-10-10T16:25:22.947 回答
0

在考虑了更多之后,我想知道,如果我将这些组分组,

select
    group_concat( 
      User
      order by User
      separator ', '
      ) LikeViewers
  , Shows
from
(
select
      User
    , group_concat(
        concat('"', Show, '"')
        order by Show
        separator ', '
        ) Shows
  from
    Viewings
  group by
    User
) ViewerGroups
group by
  Shows

产生这样的输出

|LikeViewers  |Shows               |
|-------------|--------------------|
|002, 005, 006|"Lost"              |
|001, 003     |"Lost", "South Park"|
|004          |"South Park"        |

诚然,结果可能更可重用,但我认为这是一个有趣的想法。

在这里提琴

于 2012-10-11T08:40:39.883 回答