0

我有一个相对复杂的查询,这里是小提琴:http ://sqlfiddle.com/#!2/65c66/12/0

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s
GROUP BY s.signature_id
HAVING sum(s.petition_id=p.id)
AND sum(s.petition_id=p2.id);

这是解释(显示我在真实数据集中的行数,而不是 sqlfiddle):

+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows     | Extra                           |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
|  1 | SIMPLE      | p     | ALL   | PRIMARY       | NULL         | NULL    | NULL |     1727 | Using temporary; Using filesort |
|  1 | SIMPLE      | p2    | ALL   | PRIMARY       | NULL         | NULL    | NULL |     1727 | Using where; Using join buffer  |
|  1 | SIMPLE      | s     | index | NULL          | signature_id | 105     | NULL | 12943894 | Using index; Using join buffer  |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+

在这一点上,查询使用了太多的磁盘空间和文件排序,我还没有看到它在出错之前成功完成。我可以执行任何优化以更快或更有效地进行吗?

谢谢!

4

2 回答 2

1

是的。您可以做的一件事是将连接条件移至on子句:

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY s.signature_id;

我也认为group by应该打开p.title, p2.title

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY p.title, p2.title;

但是,您为什么要进行第二次加入?我不确定查询应该做什么。

编辑:

我认为您想要的基本查询是:

select s1.petition_id, s2.petition_id, count(*) as numsignatures, 
       group_concat(s1.signature_id) as signatures  
from wtp_data_signatures s1 join
     wtp.data_signatures s2
     on s1.signature_id = s2.signature_id and
        s1.petition_id < s2.petition_id
group by s1.petition_id, s2.petition_id;

您现在可以扩展它以包含请愿信息:

select p1.title as title_1, p2.title as title_2,
       s1.petition_id, s2.petition_id, count(*) as numsignatures, 
       group_concat(s1.signature_id) as signatures  
from wtp_data_signatures s1 join
     wtp.data_signatures s2
     on s1.signature_id = s2.signature_id and
        s1.petition_id < s2.petition_id join
     wtp_data_petitions p1
     on p1.id = s1.petition_id join
     wtp_data_petitions p2
     ON p2.id = s2.petition_id 
group by s1.petition_id, s2.petition_id;
于 2013-07-26T02:21:02.777 回答
0

你有关于连续剧的索引吗?p.serial > p2.serial 上的自联接看起来是它需要对 wtp_data_petitions 进行排序的唯一原因。尝试添加索引。

于 2013-07-26T02:37:17.067 回答