-7

我知道如何使用explode 和一些数组函数在文本中获得一个词的频率,但我真正想要的是获得2 个或更多词的频率。例如这个文本:
“这是一个示例文本。它是一个用于教育目的的示例文本。”

我需要代码来执行此操作:
是 (2)
示例文本 (2)
示例 (2)
.... 等等

提前致谢。

4

2 回答 2

0

一些伪代码可以帮助您入门:

frequencies = empty array
words = explode sentence on white spaces
for each word in words :
  sanitized word = trim word and convert to lower case
  frequency[ sanitized word ] ++
endforeach

frequency数组现在包含单词在句子中出现的次数。

于 2013-06-18T08:13:38.740 回答
0

以下代码将获得 2个连续的单词:

$string = 'This is a sample text. It is a sample text made for educational purposes. This is a sample text. It is a sample text made for educational purposes.';

$sanitized = $even = preg_replace(array('#[^\pL\s]#', '#\s+#'), array(' ', ' '), $string); // sanitize: only letters, replace multiple whitespaces with 1
$odd = preg_replace('#^\s*\S+#', '', $sanitized); // Remove the first word

preg_match_all('#\S+\s\S+#', $even, $m1); // Get 2 words
preg_match_all('#\S+\s\S+#', $odd, $m2); // Get 2 words

$results = array_count_values(array_merge($m1[0], $m2[0])); // Merge results and count
print_r($results); // printing

输出:

Array
(
    [This is] => 2
    [a sample] => 4
    [text It] => 2
    [is a] => 4
    [sample text] => 4
    [made for] => 2
    [educational purposes] => 2
    [It is] => 2
    [text made] => 2
    [for educational] => 2
    [purposes This] => 1
)

一项改进是将字符串转换为小写?
我让剩下的让你弄清楚:-)

于 2013-06-18T08:47:06.813 回答