0
<?php

$wordFrequencyArray = array();

function countWordsfrequency($filename) {  
global $wordFrequencyArray;

$contentoffile = (file_get_contents($filename));

$wordArray = preg_split('/[^a-zA-Z0-9]/', $contentoffile, -1, NO_EMPTY);


foreach (array_count_values($wordArray) as $word => $count) {
      if (!isset($wordFrequencyArray[$word])) $wordFrequencyArray[$word] = 0;
        $wordFrequencyArray[$word] += $count;
    }
}


$filenames = array('file1.txt', 'file2.txt','file3.txt','file4.txt');
foreach ($filenames as $filename) {
    countWordsfrequency($filename);
}



print_r($wordFrequencyArray);

?>

这是我的代码来查找多个文件中每个单词的频率并打印它们。现在我要做的是检查找到哪个单词出现在哪个文件中的交集。例如,如果有一个单词“stack”我想要打印它出现在哪些文件中以及我认为我已经计算过的频率。

最终结果应该是该单词出现在哪个文件中的频率。

我应该如何进行?我是否应该在countWords函数本身的 for 循环中检查它。

4

1 回答 1

0

您将不得不保存更多信息。我会回避使用类,因为看起来你不需要任何太健壮的东西。

<?php
$wordFrequencies = array();

function countWordsFrequency($filename) {  
  global $wordFrequencies;
  $contentoffile = (file_get_contents($filename));
  $wordArray = preg_split('/[^a-zA-Z0-9]/', $contentoffile, -1, NO_EMPTY);

  foreach (array_count_values($wordArray) as $word => $count) {
    $wordFreqInfo = $wordFrequencies[$word];
    if (!isset($wordFreqInfo)) {
      $wordFreqInfo = array();
      $wordFreqInfo['total'] = 0;
      $wordFreqInfo['files'] = array();
      $wordFrequencies[$word] = $wordFreqInfo;
    }

    // If this is the first occurence of this word in the file, start a count.
    if (!isset($wordFreqInfo['files'][$filename]))
      $wordFreqInfo['files'][$filename] = 0;
    }

    // Increment counts for both the total and the file.
    $wordFreqInfo['total'] += $count;
    $wordFreqInfo['files'][$filename] += $count;
  }
}

$filenames = array('file1.txt', 'file2.txt','file3.txt','file4.txt');
foreach ($filenames as $filename) {
  countWordsFrequency($filename);
}

print_r($wordFrequencies);
?>
于 2012-08-16T07:22:50.377 回答