我想在此处修改帖子(解析 CSV 文件并聚合值)以汇总多列而不是仅一列。
所以对于这些数据:
CITY,AMOUNT,AMOUNT2,AMOUNTn
London,20,21,22
Tokyo,45,46,47
London,55,56,57
New York,25,26,27
我怎样才能得到这个:
CITY,AMOUNT,AMOUNT2,AMOUNTn
London,75,77,79
Tokyo,45,46,47
New York,25,26,27
我最终会有几千列,不幸的是我不能使用 pandas 包来完成这个任务。这是我刚刚将所有三个 AMOUNT cols 聚合为一个的代码,这不是我想要的
from __future__ import division
import csv
from collections import defaultdict
def default_factory():
return [0, None, None, 0]
reader = csv.DictReader(open('test_in.txt'))
cities = defaultdict(default_factory)
for row in reader:
headers = [r for r in row.keys()]
headers.remove('CITY')
for i in headers:
amount = int(row[i])
cities[row["CITY"]][0] += amount
max = cities[row["CITY"]][1]
cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
min = cities[row["CITY"]][2]
cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
cities[row["CITY"]][3] += 1
for city in cities:
cities[city][3] = cities[city][0]/cities[city][3] # calculate mean
with open('test_out.txt', 'wb') as myfile:
writer = csv.writer(myfile, delimiter="\t")
writer.writerow(["CITY", "AMOUNT", "AMOUNT2", "AMOUNTn ,"max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)
感谢您的任何帮助