1

我想计算 idf,公式是IDF=log(D/df)D数据,df是包含搜索词的许多数据。从表中:1。tb_stemming

 ===========================================================================
 |stem_id | stem_before | stem_after | stem_freq | sentence_id |document_id|
 ===========================================================================
 |    1   |    Data     |    Data    |     1      |      0     |     1     |
 |    2   |   Discuss   |   Discuss  |     1      |      1     |     1     |
 |    3   |   Mining    |    Min     |     1      |      0     |     2     |
 ===========================================================================

这是代码:

countIDF($total_sentence,$doc_id);

$total_sentence就是 Array ( [0] => 644 [1] => 79 [2] => 264 [3] => 441 [4] => 502 [5] => 18 [6] => 352 [7] => 219 [8] => 219 )

function countIDF($total_sentence, $doc_id) {
    foreach ($total_sentence as $doc_id => $total_sentences){
       $idf = 0;
       $query1 = mysql_query("SELECT document_id, DISTINCT(stem_after) AS unique_token FROM tb_stemming group by stem_after where document_id='$doc_id'  ' ");
       while ($row = mysql_fetch_array($query)) {
           $token  = $row['unique_token'];
           $doc_id = $row['document_id'];
           $ndw    = countNDW($token);

           $idf = log($total_sentences / $ndw)+1;
           $q   = mysql_query("INSERT INTO tb_idf VALUES ('','$doc_id','$token','$ndw','$idf') ");
        }
     }
}

countNDW 的功能是:

function countNDW ($word) {
    $query = mysql_query("SELECT stem_after, COUNT( DISTINCT sentence_id ) AS ndw FROM `tb_stemming` WHERE stem_after = '$word' GROUP BY stem_after");
    while ($row = mysql_fetch_array($query)) {
        $ndw = $row['ndw'];
    }
    return $ndw;
}

它不能很好地工作,尤其是在从数据库调用时。我需要的只是计算每个document_id. 如何在我的代码中定义它?请帮帮我..非常感谢你:)

4

0 回答 0