python - 合并和汇总相似的 CSV 条目

Question

假设我的 CSV 文件是这样的：

爱，喜欢，200
爱，喜欢，50
说，索赔，30

其中数字代表在不同上下文中同时出现的单词的计数。

我想结合相似词的计数。所以我想输出如下内容：

爱，喜欢，250
说，索赔，30

我一直在环顾四周，但似乎我被这个简单的问题所困扰。

score 1 · Accepted Answer

如果没有看到确切的 CSV，就很难知道什么是合适的。下面的代码假定最后一个标记是一个计数，并且它匹配最后一个逗号之前的所有内容。

# You'd need to replace the below with the appropriate code to open your file
file = """love, like, 200
love, like, 50
love, 20
say, claim, 30"""
file = file.split("\n")

words = {}
for line in file:
    word,count=line.rsplit(",",1)   # Note this uses String.rsplit() NOT String.split()
    words[word] = words.get(word,0) + int(count)
for word in words:
    print word,": ",words[word]

并输出：

say, claim :  30
love :  20
love, like :  250

score 1 · Accepted Answer

根据您的应用程序到底是什么，我想我实际上建议在这里使用计数器。Counter 是一个 python 集合模块，可让您跟踪所有内容的数量。例如，在您的情况下，您可以迭代地更新一个计数器对象。

例如：

from collections import Counter

with open("your_file.txt", "rb") as source:
    counter = Counter()
    for line in source:
        entry, count = line.rsplit(",", 1)
        counter[entry] += int(count)

此时，您可以将数据作为 csv 写回，也可以继续使用它。

python - 合并和汇总相似的 CSV 条目

2 回答 2

Related

Reference