1

网上有很多资源展示了如何对单个单词进行字数统计,例如thisthis以及this和 others ......
但我无法找到两个单词计数频率的具体示例。

我有一个 csv 文件,其中包含一些字符串。

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

所以我希望输出如下:

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

当然,我必须去掉所有的逗号、问号......{!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

我还将删除一些我在这里找到的停用词,以便从文本中获取更具体的数据。

如何使用 python 实现这个结果?

谢谢!

4

1 回答 1

3
>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}
于 2013-09-23T06:28:58.370 回答