4

我正在尝试查找仅在某个时间戳之前存在的源站点。这个查询对于这项工作来说似乎很糟糕。知道如何优化或可能改进的索引吗?

select distinct sourcesite 
  from contentmeta 
  where timestamp <= '2011-03-15'
  and sourcesite not in (
    select distinct sourcesite 
      from contentmeta 
      where timestamp>'2011-03-15'
  );

sourcesite和timestamp上有索引,但是查询仍然需要很长时间

mysql> EXPLAIN select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in (select distinct sourcesite from contentmeta where timestamp>'2011-03-15');
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
| id | select_type        | table       | type           | possible_keys | key      | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | contentmeta | index          | NULL          | sitetime | 14      | NULL | 725697 | Using where; Using index                        |
|  2 | DEPENDENT SUBQUERY | contentmeta | index_subquery | sitetime      | sitetime | 5       | func |     48 | Using index; Using where; Full scan on NULL key |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
4

3 回答 3

3

这应该有效:

SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
  ON c2.sourcesite = c1.sourcesite
  AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
  AND c2.sourcesite IS NULL

sourcesite为获得最佳性能,在 contentmeta ( , timestamp)上有一个多列索引。

通常,连接比子查询执行得更好,因为派生表不能利用索引。

于 2012-05-09T18:34:49.430 回答
3

子查询不需要 DISTINCT,也不需要外部查询的 WHERE 子句,因为您已经通过 NOT IN 进行过滤。

尝试:

select distinct sourcesite
from contentmeta
where sourcesite not in (
    select sourcesite
    from contentmeta
    where timestamp > '2011-03-15'
);
于 2012-05-09T18:39:17.200 回答
1

我发现“不在”只是不能在许多数据库中很好地优化。使用 aleft outer join代替:

select distinct sourcesite 
from contentmeta cm 
left outer join
(
   select distinct sourcesite
   from contentmeta
   where timestamp>'2011-03-15'
) t
  on cm.sourcesite = t.sourcesite
where timestamp <= '2011-03-15' and t.sourcesite is null

这假设sourcesite永远不会为空。

于 2012-05-09T18:33:20.267 回答