0

我有一个爬虫,它扫描网页中的所有单词。然后它将每个单词及其所属的 url 插入 mysql 数据库。然后根据文档中找到的单词数量对搜索进行排名。问题是...如何将多个术语查询添加到现有查询中。

它非常适合单词查询,但我希望我的查询尝试在同一个网页中一起查找单词,如果网页中没有出现任何单词,则正常返回这些词的结果。

我的查询如下:

         $results = addslashes( $_POST['results'] );

               " SELECT p.page_url AS url,
                       COUNT(*) AS occurrences 
                       FROM page p, word w, occurrence o
                       WHERE p.page_id = o.page_id AND
                       w.word_id = o.word_id AND
                       w.word_word = \"$keyword\"
                       GROUP BY p.page_id
                       ORDER BY occurrences DESC
                       LIMIT $results"
4

2 回答 2

0

如果数据库引擎支持,您可以进行子选择。例子:

SELECT 
  url, 
  (select count(*) from table where conditions1) as count1, 
  (select count(*) from table where conditions2) as count2 
 FROM table
于 2012-11-24T17:00:12.393 回答
0

用于COUNT(DISTINCT ...)计算在每页上找到的不同单词的数量,并用于IN查找单词列表中的任何一个:

SELECT
    p.page_url AS url,
    COUNT(DISTINCT w.word_word) AS words_found
    COUNT(*) AS occurrences 
FROM page p
JOIN occurrence o ON p.page_id = o.page_id
JOIN word w ON w.word_id = o.word_id
WHERE w.word_word IN ('foo', 'bar')
GROUP BY p.page_id
ORDER BY occurrences DESC

如果要确保页面上至少有 n 个搜索词,请添加 HAVING 子句:

GROUP BY p.page_id
HAVING COUNT(DISTINCT w.word_word) >= 2
ORDER BY occurrences DESC
于 2012-11-24T16:59:09.013 回答