0

所以我正在循环浏览一个有 41 段的文件。对于每个段落,我试图 [1] 首先将字符串分解为一个数组,然后获取该段落的词频。然后我想结合所有段落的数据,得到整个文档的词频。

我可以获得给定段落的“单词”及其“频率”的数组,但是我无法合并每个段落的结果以获得“整个文档的单词频率”。这里是我有的:

function sectionWordFrequency($sectionFS)
{
$section_frequency = array();
$filename = $sectionFS . ".xml";
$xmldoc = simplexml_load_file('../../editedtranscriptions/' . $filename);
$xmldoc->registerXPathNamespace("tei", "http://www.tei-c.org/ns/1.0");
$paraArray = $xmldoc->xpath("//tei:p");

foreach ($paraArray as $p)
{
$para_frequency = (array_count_values(str_word_count(strtolower($p), 1)));
$section_frequency[] = $para_frequency;
}


return array_merge($section_frequency);
}

/// now I call the function, sort it, and try to display it
$section_frequency = sectionWordFrequency($fs); 
ksort($section_frequency);

foreach ($section_frequency as $word=>$frequency)
{
 echo $word . ": " . $frequency . "</br>";
}

现在我得到的结果是:

1:阵列 2:阵列 3:阵列 4:阵列

任何帮助是极大的赞赏。

4

1 回答 1

0

尝试替换此行

$section_frequency[] = $para_frequency;

有了这个

$section_frequency = array_merge($section_frequency, $para_frequency);

接着

return $section_frequency
于 2011-10-02T00:55:11.620 回答