-4

我有以下数据

Name Year  score
A    1996  84
A    1997  65
A    1996  76
A    1998  78
A    1998  65
B    1998  53
B    1996  98
B    1996  83
B    1996  54

我想要输出如下

Name Year  max_score
A    1996  84
B    1996  98

我如何为这项工作编写 python map reduce 代码?

我可以将 NAME 和 YEAR 创建为单个键,并作为值进行评分。

但是有没有其他方法可以处理这个问题。

4

2 回答 2

2

假设你所有的年份和分数都是正数:

from collections import defaultdict

mapping = defaultdict( lambda: (0,0) )
with open(datafile) as f:
     for line in f:
         name,year,score = line.split()
         try:
            year = int(year)
            score = int(score)
         except ValueError:
            continue

         if score > mapping[name][1]:
            mapping[name] = year,score

或者稍微简洁一些,但对错误的鲁棒性较差:

from collections import defaultdict

mapping = defaultdict( lambda: (0,0) )
with open(datafile) as f:
     f.readline() #header.  Don't need it.
     for line in f:
         name,year,score = line.split()
         if int(score) > mapping[name][1]:
            mapping[name] = int(year),int(score)
于 2012-09-26T12:51:33.440 回答
0

这就是你所追求的吗?

def mapper(key, value):
    name, year, score = value.split()
    yield name, (year, score)

def reducer(name, values):
    yield name, max(values, key=operator.itemgetter(1))
于 2012-09-26T14:58:02.143 回答