2

I have a CSV file like this:

student | score
John    |  A
John    |  C
Mary    |  B
Mary    |  D
Kim     |  B
Kim     |  A

Each student has multiple scores, and I want to merge score information under unique student with the highest score.

I want to have a table like this in result:

student | score
John    | A
Mary    | B
Kim     | A

I tried to find a post about this but failed. Is there any approach to do this using built-in library?

4

2 回答 2

2

使用itertools.groupby按学生姓名分组。

import csv
import itertools
import operator

with open('1.csv') as f, open('2.csv', 'w') as fout:
    reader = csv.DictReader(f, delimiter='|')
    writer = csv.DictWriter(fout, fieldnames=reader.fieldnames, delimiter='|')
    writer.writeheader()
    for student, group in itertools.groupby(reader, key=operator.itemgetter('student')):
        max_score = min(map(operator.itemgetter('score'), group))
        writer.writerow({'student': student, 'score': max_score})
于 2013-07-01T07:55:25.093 回答
1

使用字典,只存储目前找到的最高值。因为分数是以字母形式给出的,这意味着您需要按字典顺序找到“最低”的字母:

import csv

students = {}

with open(inputcsvfile, 'rb') as scoressource:
    reader = csv.reader(scoressource)
    for name, score in reader:
        if score < students.get(name, 'Z'):
            students[name] = score

with open(outputcsvfile, 'wb') as scoresdest:
    writer = csv.writer(scoresdest)
    for name, score in students.iteritems():
        writer.writerow([name, score])
于 2013-07-01T07:48:12.037 回答