我想计算 idf,公式是IDF=log(D/df)
总D
数据,df
是包含搜索词的许多数据。从表中:1。tb_stemming
===========================================================================
|stem_id | stem_before | stem_after | stem_freq | sentence_id |document_id|
===========================================================================
| 1 | Data | Data | 1 | 0 | 1 |
| 2 | Discuss | Discuss | 1 | 1 | 1 |
| 3 | Mining | Min | 1 | 0 | 2 |
===========================================================================
这是代码:
countIDF($total_sentence,$doc_id);
那$total_sentence
就是
Array ( [0] => 644 [1] => 79 [2] => 264 [3] => 441 [4] => 502 [5] => 18 [6] => 352 [7] => 219 [8] => 219 )
function countIDF($total_sentence, $doc_id) {
foreach ($total_sentence as $doc_id => $total_sentences){
$idf = 0;
$query1 = mysql_query("SELECT document_id, DISTINCT(stem_after) AS unique_token FROM tb_stemming group by stem_after where document_id='$doc_id' ' ");
while ($row = mysql_fetch_array($query)) {
$token = $row['unique_token'];
$doc_id = $row['document_id'];
$ndw = countNDW($token);
$idf = log($total_sentences / $ndw)+1;
$q = mysql_query("INSERT INTO tb_idf VALUES ('','$doc_id','$token','$ndw','$idf') ");
}
}
}
countNDW 的功能是:
function countNDW ($word) {
$query = mysql_query("SELECT stem_after, COUNT( DISTINCT sentence_id ) AS ndw FROM `tb_stemming` WHERE stem_after = '$word' GROUP BY stem_after");
while ($row = mysql_fetch_array($query)) {
$ndw = $row['ndw'];
}
return $ndw;
}
它不能很好地工作,尤其是在从数据库调用时。我需要的只是计算每个document_id
. 如何在我的代码中定义它?请帮帮我..非常感谢你:)