python - 计算列表 PYTHON 中单词的重复次数

Question

我有一个像这样的代码：

s = "hello this is hello this is baby baby baby baby hello"
slist = s.split()
finallist = []
for word in slist:
    if len(word) >= 4:
          final = final + [word]

基本上上面的这段代码用于获取列表并仅放置一个包含超过 4 个字符的单词列表。

从这个列表中，我希望能够计算相同单词出现的次数并将其保存到新列表中。所以它就像 [3,2,4] 3 是你好的时间，2 是这个时间，4 是婴儿。

score 3 · Accepted Answer

from collections import Counter
import re

reg = re.compile('\S{4,}')

s = "hello this is hello this is baby baby baby baby hello"
c = Counter(ma.group() for ma in reg.finditer(s))
print c

结果

Counter({'baby': 4, 'hello': 3, 'this': 2})

还：

from collections import defaultdict
d = defaultdict(int)

s = "hello this is hello this is baby baby baby baby hello"

for w in s.split():
    if len(w)>=4:
        d[w] += 1

print d

score 3 · Accepted Answer

collections.Counter显然是您的朋友（除非您需要按特定排序顺序输出）。将它与生成器理解相结合以生成所有长度为 4 的单词，您就可以了。

from collections import Counter

Counter(w for w in s.split() if len(w) >= 4)

如果您需要元素按首次出现的顺序排列，请使用有序字典：

from collections import OrderedDict

wc = OrderedDict()
for w in s.split():
    if len(w) >= 4:
        wc[w] = wc.get(w, 0) + 1

score 1 · Accepted Answer

您所要做的就是使用countslist 中的方法。

我认为您可以使用 dict 更好地控制

s = "hello this is hello this is baby baby baby baby hello"
slist = s.split()
finaldict = {}
for word in slist:
    if len(word) >= 4 and not finaldict.get(word):
          finaldict[word] = slist.count(word)

现在，如果您想要值列表，只需执行以下操作：finallist = finaldict.values()

python - 计算列表 PYTHON 中单词的重复次数

3 回答 3

Related

Reference