我将尝试根据 Broken Link 的评论回答我自己的问题(谢谢你):
您已经从文档数据库中提取了由 1 到 3 个单词组成的短语。在这些提取的短语中,有以下短语:
对于每个短语,您去除所有特殊字符和空格并将字符串变为小写:
$phrase = '混血王子'; $phrase = preg_replace('/[^az]/i', '', $phrase); $phrase = strtolower($phrase); // 结果是“混血王子”
完成此操作后,所有 3 个短语(见上文)都有一个共同的拼写:
- 混血王子 => 混血王子
- 混血王子 => 混血王子
- 混血王子 => 混血王子
所以“混血王子”是父短语。您将普通短语和父短语都插入到数据库中。
要显示类似 Twitter 的“热门话题管理员”,请执行以下操作:
// first select the top 10 parent phrases
$sql1 = "SELECT parentPhrase, COUNT(*) as cnt FROM phrases GROUP BY parentPhrase ORDER BY cnt DESC LIMIT 0, 10";
$sql2 = mysql_query($sql1);
while ($sql3 = mysql_fetch_assoc($sql2)) {
$parentPhrase = $sql3['parentPhrase'];
$childPhrases = array(); // set up an array for the child phrases
$fifthPart = round($sql3['cnt']*0.2);
// now select all child phrases which make 20% of the parent phrase or more
$sql4 = "SELECT phrase FROM phrases WHERE parentPhrase = '".$sql3['parentPhrase']."' GROUP BY phrase HAVING COUNT(*) >= ".$fifthPart;
$sql5 = mysql_query($sql4);
while ($sql6 = mysql_fetch_assoc($sql5)) {
$childPhrases[] = $sql3['phrase'];
}
// now you have the parent phrase which is on the left side of the arrow in $parentPhrase
// and all child phrases which are on the right side of the arrow in $childPhrases
}
这就是你的想法吗,断链?这行得通吗?