我有两张桌子:
packages 和 package_to_tag 都运行 MyISAM
这些表格的结构如下:
包
+----------------+------------------+----------------+
| aid(primary) | source | date(index) |
+----------------+------------------+----------------+
| 1 | CA | 2013-04-05 |
+----------------+------------------+----------------+
| 2 | FL | 2013-05-05 |
+----------------+------------------+----------------+
| 3 | UT | 2012-06-13 |
+----------------+------------------+----------------+
| 4 | VT | 2011-04-29 |
+----------------+------------------+----------------+
| 5 | CT | 2013-04-10 |
+----------------+------------------+----------------+
package_to_tag package-tag 上的唯一索引,并且 package_aid 和 tag 都有索引
+---------------+------------------+
| package_aid | tag |
+---------------+------------------+
| 2 | sports |
+---------------+------------------+
| 2 | nba |
+---------------+------------------+
| 1 | food |
+---------------+------------------+
| 1 | burrito |
+---------------+------------------+
| 4 | hockey |
+---------------+------------------+
| 4 | sports |
+---------------+------------------+
| 3 | news |
+---------------+------------------+
| 5 | sports |
+---------------+------------------+
| 5 | nba |
+---------------+------------------+
所以我要找出哪些包裹同时具有运动和 nba 作为标签的基本查询是:
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
在我尝试将日期排序添加到结果之前,这非常有效。(请记住,我的包裹记录集在 400k 范围内)
我根据匹配标签获取源的查询是:
SELECT package_aid, source
FROM package_to_tag
RIGHT JOIN packages ON packages.aid = package_to_tag.package_aid
AND tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
ORDER BY date DESC
LIMIT 500
其中,有 40 万条记录,最多只需要 5 秒。除非我删除date
排序。然后不到一秒钟。因此,由于我在 IN 语句上总是取得了不错的成功,因此我尝试通过以下方式缩小我的初始结果集:
SELECT aid,source FROM packages
WHERE aid IN(
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
)
ORDER BY date DESC
LIMIT 500
我想我只会将排序应用于大约 8-10k 条记录,而不是整个记录集。
但是,这只是将数据库固定在 100% 的利用率,我不得不重新启动.... 即使我将带有额外标签的内部选择缩小到总共 80 条记录或更少。
我试着只运行这个查询:
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
这会在一秒钟内返回 8-10k 条记录。
我错过了什么?