这个怎么样,使用pandas
:
>>> updated = orig.append(new).groupby('Name').last().fillna(0)
>>> updated
Day1 Day2 Day3 Day4
Name
Abe 1 2 4 0
Ben 2 2 1 0
Cat 2 3 2 0
Dan 0 0 6 3
首先,读入数据(这将取决于格式的细节):
>>> orig = pd.read_csv("days1.txt", delim_whitespace=True)
>>> new = pd.read_csv("days2up.txt", delim_whitespace=True)
>>> orig
Name Day1 Day2 Day3
0 Abe 1 2 3
1 Ben 2 2 1
2 Cat 2 3 2
>>> new
Name Day3 Day4
0 Abe 4 0
1 Dan 6 3
然后追加新数据,这会自动扩展列:
>>> orig.append(new)
Day1 Day2 Day3 Day4 Name
0 1 2 3 NaN Abe
1 2 2 1 NaN Ben
2 2 3 2 NaN Cat
0 NaN NaN 4 0 Abe
1 NaN NaN 6 3 Dan
按名称列组合,并取最后一个有效值(最初我担心这会丢失 Abe 的 Day1 和 Day2 信息,但事实并非如此):
>>> orig.append(new).groupby("Name").last()
Day1 Day2 Day3 Day4
Name
Abe 1 2 4 0
Ben 2 2 1 NaN
Cat 2 3 2 NaN
Dan NaN NaN 6 3
用 0 替换缺失值:
>>> orig.append(new).groupby("Name").last().fillna(0)
Day1 Day2 Day3 Day4
Name
Abe 1 2 4 0
Ben 2 2 1 0
Cat 2 3 2 0
Dan 0 0 6 3
最后写出:
>>> updated = orig.append(new).groupby("Name").last().fillna(0)
>>> updated.to_csv("updated.csv")
>>> !cat updated.csv
Name,Day1,Day2,Day3,Day4
Abe,1.0,2.0,4,0.0
Ben,2.0,2.0,1,0.0
Cat,2.0,3.0,2,0.0
Dan,0.0,0.0,6,3.0