0

我有一个搜索引擎,它扫描给定网页中的所有单词,然后显示它们的出现。然后按照单词在文档中出现的次数对它们进行排名。但它不会返回多个术语查询。

下面是我的 SQL 查询。我希望能够让它检查所有输入的单词,然后按单词在文档中出现的次数进行排名。它目前仅适用于单项查询。

         $result = mysql_query(" SELECT p.page_url AS url,
                       COUNT(*) AS occurrences 
                       FROM page p, word w, occurrence o
                       WHERE p.page_id = o.page_id AND
                       w.word_id = o.word_id AND
                       w.word_word = \"$keyword\"
                       GROUP BY p.page_id
                       ORDER BY occurrences DESC
                       LIMIT $results" );
4

2 回答 2

1

我将使用 MATCH-AGAINST,这对于像搜索引擎这样的 MySQL 优化搜索应该更好。您应该查看全文搜索:http: //dev.mysql.com/doc/refman/5.5/en//fulltext-search.html

注意:在 MySQL 表中,应将索引编辑为数据库表中关键字行的 FULLTEXT。这将为搜索提供更好的性能。

例子:

输入关键字示例:

$keywords = '+Word+Word2+Word3';

SELECT p.page_url AS url,
COUNT(*) AS occurrences, MATCH('w.word_word') AGAINST ('$keywords') as keyword FROM page p, occurrence o, w.word WHERE MATCH
('w.word_word') AGAINST('{$keywords}' IN 
BOOLEAN MODE) 
AND p.page_id = o.page_id AND w.word_id = o.word_id
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results

如果您的查询未优化(组太多、where 子句和条件),则在其他去优化模式下或有降低服务器性能的风险。而不是这个,您可以在 MySQL 中使用 REGULAR EXPRESSION 例如:

REGEXP "/(honda)|(jazz)|(manual)/"

这也将使用正则表达式获得良好的性能(不推荐用于大型数据库):

做一个循环并计算它而不是放入 REGEXP:

$keywords = "keyword1,keyword2,keyword3";

$expl = explode("," $keywords);

if (count($expl) == 1)
{
    $all = w.word_word REGEXP = '[[:<:]]$keywords[[:>:]]';
}
else
{
    $all = '';
    foreach ($expl as $keyone)
    {
        $all .= 'OR '.w.word_word REGEXP = '[[:<:]]$keyone[[:>:]]';
    }
}

$sql =  'SELECT p.page_url AS url,
COUNT(*) AS occurrences 
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
$all
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results';

$result_query = mysql_query($sql);
于 2012-10-16T20:41:37.807 回答
1

如果你想得到所有的话,那么你的加入条件将不允许你这样做

w.word_word = \"$keyword\"

您的查询可以写成如下

$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
     . "FROM page p "
     . "INNER JOIN occurence o ON p.page_id = o.page_id "
     . "INNER JOIN word w ON w.word_id = o.word_id "
     . "GROUP BY p.page_id "
     . "ORDER BY occurences DESC "
     . "LIMIT {$results}";
$result = mysql_query($sql);

这将抓取word表格中的所有单词,从而为您提供(据我所知)需要的结果。

如果您对几个词感兴趣,那么您可以使用该IN语句(正如 Dev 在评论中所建议的那样),您的查询将变为:

$my_keywords = array('apple', 'banana');
// This produces: "apple", "banana" and assumes that all of your 
// keywords are in lower case. If not, you can transform them to lower
// case or if you don't want that, remove the LOWER() function below 
// from the WHERE
$keywords    = '"' . implode('","', $my_keywords) . '"';
$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
     . "FROM page p "
     . "INNER JOIN occurence o ON p.page_id = o.page_id "
     . "INNER JOIN word w ON w.word_id = o.word_id "
     . "WHERE LOWER(w.word_word) IN ({$keywords}) "
     . "GROUP BY p.page_id "
     . "ORDER BY occurences DESC "
     . "LIMIT {$results}";
$result = mysql_query($sql);

最后,尝试使用mysqli代替mysql,或 PDO。

高温高压

于 2012-10-16T20:30:29.867 回答