2

我有一个 csv 文件,其第一列中有重复值,例如:

mg,known,127
mg,unknown,142
pnt,known,37
pnt,unknown,0
lmo,known,75
lmo,unknown,3
sl,known,197
sl,unknown,21
oc,unknown,32
oc,known,163
sv,known,368
sv,unknown,308
az,unknown,6
az,known,241
bug,unknown,1
bug,known,167
li,unknown,15
li,known,174
lg,known,3

我想要做的是构建一个新的 csv 文件,例如:

header1, known, unknown
mg, 127, 142
pnt, 37, 0

我试图弄清楚我如何才能真正构建行:

def read_stats(path):
    has_seen = set()
    with open(writepath, 'wb') as write_csv:
        with open(path, 'r') as csv_file:
            data_reader = csv.reader(csv_file, delimiter=',')
            for line in data_reader:
                if line[0] in has_seen:

这是我目前感到震惊的地方,我是否必须保留指向下一行的指针?

4

1 回答 1

3

这是在OrderedDict中累积结果的一种方法:

>>> import csv
>>> import collections

>>> d = collections.OrderedDict()
>>> for header1, category, value in csv.reader(datafile):
        d.setdefault(header1, {})[category] = value

>>> for header1, m in d.items():
        print ', '.join([header1, m['known'], m['unknown']])

mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15

如果您可以假设这些行总是首先与已知组连续成对出现,您可以为 knowns 创建一个中间结果并为 unkwowns 发出一个完整的行:

>>> for header1, category, value in csv.reader(data):
        if category == 'known':
            result = [header1, value]
        else:
            result += [value]
            print ', '.join(result)

mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15
于 2013-04-17T03:21:25.103 回答