如何从 PHP/MySQL 中的多个内容表中获取最流行的单词。
例如,我有一个带有论坛帖子的表 forum_post;这包含一个主题和内容。除了这些之外,我还有多个其他具有不同字段的表,这些表也可以包含要分析的内容。
我可能会自己去获取所有内容,剥离(可能)html在空格上爆炸字符串。删除引号和逗号等,并通过在遍历所有单词时保存数组来计算不常见的单词。
我的主要问题是是否有人知道一种可能更容易或更快的方法。
我似乎找不到任何有用的答案,这可能是错误的搜索模式。
您正在寻找的魔法是一个名为str_word_count()的 php 函数。
在下面的示例代码中,如果您从中得到很多无关的词,则需要编写自定义剥离来删除它们。此外,您还需要从单词和其他字符中删除所有 html 标记。
我使用与此类似的东西来生成关键字(显然该代码是专有的)。简而言之,我们正在获取提供的文本,我们正在检查单词频率,如果单词按顺序出现,我们将根据优先级将它们排序在一个数组中。因此,最常见的单词将在输出中排在第一位。我们不计算只出现一次的单词。
<?php
$text = "your text.";
//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}
$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
if ($count > 2){
$list .= "$word ";
}
}
if (empty($list)){
$list = "Not enough duplicate words for popularity contest.";
}
echo $list;
?>
我看到你已经接受了一个答案,但我想给你一个在某种意义上可能更灵活的替代方案:(自己决定:-))我没有测试过代码,但我想你明白了。$dbh 是一个 PDO 连接对象。然后由您决定如何处理生成的 $words 数组。
<?php
$words = array();
$tableName = 'party'; //The name of the table
countWordsFromTable($words, $tableName)
$tableName = 'party2'; //The name of the table
countWordsFromTable($words, $tableName)
//Example output array:
/*
$words['word'][0] = 'happy'; //Happy from table party
$words['wordcount'][0] = 5;
$words['word'][1] = 'bulldog'; //Bulldog from table party2
$words['wordcount'][1] = 15;
$words['word'][2] = 'pokerface'; //Pokerface from table party2
$words['wordcount'][2] = 2;
*/
$maxValues = array_keys($words, max($words)); //Get all keys with indexes of max values of $words-array
$popularIndex = $maxValues[0]; //Get only one value...
$mostPopularWord = $words[$popularIndex];
function countWordsFromTable(&$words, $tableName) {
//Get all fields from specific table
$q = $dbh->prepare("DESCRIBE :tableName");
$q->execute(array(':tableName' = > $tableName));
$tableFields = $q->fetchAll(PDO::FETCH_COLUMN);
//Go through all fields and store count of words and their content in array $words
foreach($tableFields as $dbCol) {
$wordCountQuery = "SELECT :dbCol as word, LENGTH(:dbCol) - LENGTH(REPLACE(:dbCol, ' ', ''))+1 AS wordcount FROM :tableName"; //Get count and the content of words from every column in db
$q = $dbh->prepare($wordCountQuery);
$q->execute(array(':dbCol' = > $dbCol));
$wrds = $q->fetchAll(PDO::FETCH_ASSOC);
//Add result to array $words
foreach($wrds as $w) {
$words['word'][] = $w['word'];
$words['wordcount'][] = $w['wordcount'];
}
}
}
?>