python - Python: how to group rows by one column and pick one row by another column?

Question

I have a CSV file like this:

student | score
John    |  A
John    |  C
Mary    |  B
Mary    |  D
Kim     |  B
Kim     |  A

Each student has multiple scores, and I want to merge score information under unique student with the highest score.

I want to have a table like this in result:

student | score
John    | A
Mary    | B
Kim     | A

I tried to find a post about this but failed. Is there any approach to do this using built-in library?

score 2 · Accepted Answer

使用itertools.groupby按学生姓名分组。

import csv
import itertools
import operator

with open('1.csv') as f, open('2.csv', 'w') as fout:
    reader = csv.DictReader(f, delimiter='|')
    writer = csv.DictWriter(fout, fieldnames=reader.fieldnames, delimiter='|')
    writer.writeheader()
    for student, group in itertools.groupby(reader, key=operator.itemgetter('student')):
        max_score = min(map(operator.itemgetter('score'), group))
        writer.writerow({'student': student, 'score': max_score})

score 1 · Accepted Answer

使用字典，只存储目前找到的最高值。因为分数是以字母形式给出的，这意味着您需要按字典顺序找到“最低”的字母：

import csv

students = {}

with open(inputcsvfile, 'rb') as scoressource:
    reader = csv.reader(scoressource)
    for name, score in reader:
        if score < students.get(name, 'Z'):
            students[name] = score

with open(outputcsvfile, 'wb') as scoresdest:
    writer = csv.writer(scoresdest)
    for name, score in students.iteritems():
        writer.writerow([name, score])

python - Python: how to group rows by one column and pick one row by another column?

2 回答 2

Related

Reference