mysql - 查询以检查每个可能的配对组合之间的共性（交集）

Question

我编写了一个程序来生成由从大量问题中提取的问题组合而成的测试。每个测试都有许多标准，只有当它们满足这些标准时，程序才会将它们保存到数据库中。

我编写的程序是为了确保问题的分布尽可能均匀，即，在生成问题组合时，算法优先考虑池中在先前迭代中被询问次数最少的问题。

我创建了一个表，test_questions基本上存储test_id每个测试的和另一个，使用每个测试的 n 行test_questions存储test_ids 及其对应question_id的 s（其中 n 是每个测试中的问题数）。

现在我将测试存储在数据库中，我想检查不同测试对之间问题的重叠是否在一定范围内，我认为我应该能够使用 SQL 来做到这一点。

使用自联接，我能够使用此查询来选择测试 3 和测试 5 常见的问题：

-- Get the number of questions that are common to tests 3 and 5
SELECT count(tq1.question_id) AS Overlap
FROM test_questions AS tq1
JOIN test_questions AS tq2
ON tq1.question_id = tq2.question_id
WHERE tq1.test_id = 5
AND tq2.test_id = 3;

我能够从前 n (5) 次测试中生成每个可能的测试对组合：

-- Get all combinations of pairs of tests from 1 to 5
SELECT t1.test_id AS Test1, t2.test_id AS Test2
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
WHERE t1.test_id <= 5
AND t2.test_id <= 5;

我想做但迄今为止未能做的是将上述两个查询结合起来，以显示前 5 个测试的每个可能的配对组合——以及两个测试共有的问题数量。

-- This doesn't work
SELECT t1.test_id AS Test1, t2.test_id AS Test2, count(tq1.question_id) AS Overlap
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
JOIN test_questions AS tq1
ON t1.test_id = tq1.test_id
JOIN test_questions AS tq2
ON t2.test_id = tq2.test_id
WHERE t1.test_id <= 11
AND t2.test_id <= 11
GROUP BY t1.test_id, t2.test_id;

我在这个SQL Fiddle创建了两个表的简化版本（带有随机数据）

注意：我使用 MySQL 作为我的 DBMS，但 SQL 应该与 ANSI 标准兼容。

编辑：我编写的用于生成测试的程序实际上生成的测试数量超过了我需要的测试数量，我只想比较前 n 个测试。在示例中，我添加了<= 5WHERE 条件来忽略额外的测试。

根据 Thorsten Kettner 的示例数据，澄清我在寻找什么：

test 1: a, b and c
test 2: a, b and d
test 3: d, e and f

结果将是：

Test    Test    Overlap
Test1   Test2   2       (a and b in common)
Test1   Test3   0       (no questions in common)
Test2   Test3   1       (d is common to both)

score 4 · Accepted Answer

你只需要group by你的第一个查询（基本上）。我还添加了另一个条件，因此按顺序生成测试 ID：

SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1 LEFT JOIN
     test_questions tq2
     ON tq1.question_id = tq2.question_id and
        tq1.test_id < tq2.test_id
GROUP BY tq1.test_id, tq2.test_id;

这是标准 SQL。

如果您想获得所有测试对，即使是那些没有共同问题的测试，这里是另一种方法：

SELECT t1.test_id as test_id1, t2.test_id as test_id2, count(tq2.question_id) AS Overlap
FROM tests t1 CROSS JOIN
     tests t2 LEFT JOIN
     test_questions tq1
     on t1.test_id = tq1.test_id LEFT JOIN
     test_questions tq2
     ON t2.test_id = tq2.test_id and tq1.question_id = tq2.question_id 
GROUP BY t1.test_id, t2.test_id;

这假设您有一个表，每个测试都有一行。如果不是，请替换tests为(select distinct test from test_questions).

score 2 · Accepted Answer

我修改了 Gordon 的答案，这个查询提供了一个测试组合列表及其相应的重叠（常见问题）：

SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1
JOIN test_questions tq2
ON tq1.question_id = tq2.question_id
AND tq1.test_id < tq2.test_id 
WHERE tq1.test_id <= 5
AND tq2.test_id <= 5
GROUP BY tq1.test_id, tq2.test_id;

score 1 · Accepted Answer

第一步：查找所有测试组合，例如：1-2、1-3、2-3
第二步：加入第一个测试的所有问题。
第三步：如果存在则外连第二个检验的等号问题。
最后一步：计算每个测试组合找到的相同问题。

    选择 test_combinations.t1_test_id, test_combinations.t2_test_id, count(q2.question_id)
    从
    (
        选择 t1.test_id 作为 t1_test_id， t2.test_id 作为 t2_test_id
        from (select test_id from tests where test_id t1.test_id
    ) 测试组合
    内部连接 test_questions q1 on q1.test_id = test_combinations.t1_test_id
    在 q2.test_id = test_combinations.t2_test_id 和 q2.question_id = q1.question_id 上离开加入 test_questions q2
    按 test_combinations.t1_test_id、test_combinations.t2_test_id 分组
    按 test_combinations.t1_test_id、test_combinations.t2_test_id 排序；

我在您的小提琴中添加了一个没有重叠问题的测试，并取消了对 test_id <= 5 的限制，因此您会看到成对的测试重叠问题为零：http ://sqlfiddle.com/#!2/e83aa/1

mysql - 查询以检查每个可能的配对组合之间的共性（交集）

3 回答 3

Related

Reference