python - 计算字符串列表中出现的单词

Question

如何计算一个单词在字符串列表中出现的次数？

例如：

['This is a sentence', 'This is another sentence']

“句子”这个词的结果是 2

score 12 · Accepted Answer

使用一个collections.Counter()对象并在空格上拆分您的单词。您可能还想将单词小写，并删除标点符号：

from collections import Counter

counts = Counter()

for sentence in sequence_of_sentences:
    counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())

或者也许使用只匹配单词字符的正则表达式：

from collections import Counter
import re

counts = Counter()
words = re.compile(r'\w+')

for sentence in sequence_of_sentences:
    counts.update(words.findall(sentence.lower()))

现在你有了一个counts按字数统计的字典。

演示：

>>> sequence_of_sentences = ['This is a sentence', 'This is another sentence']
>>> from collections import Counter
>>> counts = Counter()
>>> for sentence in sequence_of_sentences:
...     counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())
... 
>>> counts
Counter({'this': 2, 'is': 2, 'sentence': 2, 'a': 1, 'another': 1})
>>> counts['sentence']
2

score 3 · Accepted Answer

你可以用一点正则表达式和一本字典很容易地做你想做的事。

import re

dict = {}
sentence_list = ['This is a sentence', 'This is a sentence']
for sentence in sentence_list:
    for word in re.split('\s', sentence): # split with whitespace
        try:
            dict[word] += 1
        except KeyError:
            dict[word] = 1
print dict

python - 计算字符串列表中出现的单词

2 回答 2

Related

Reference