请考虑下表:
_____________________
| sentence_word |
|---------|---------|
| sent_id | word_id |
|---------|---------|
| 1 | 1 |
| 1 | 2 |
| ... | ... |
| 2 | 4 |
| 2 | 1 |
| ... | ... |
有了这个表结构,我想存储句子的单词。现在我想找出句子中哪些单词与特定单词一起出现。结果应如下所示:
_____________________
| word_id | counted |
|---------|---------|
| 5 | 1000 |
| 7 | 800 |
| 3 | 600 |
| 1 | 400 |
| 2 | 100 |
| ... | ... |
查询如下所示:
SELECT
word_id,
COUNT(*) AS counted
FROM sentence_word
WHERE sentence_word.sent_id IN (SELECT
sent_id
FROM sentence_word
WHERE word_id = [desired word]
)
AND word_id != [desired word]
GROUP BY word_id
ORDER BY counted DESC;
查询正常工作,但它始终扫描整个表。我为 sent_id 和 word_id 创建了一个索引。你有什么想法可以优化它,它不需要一直扫描整个表吗?