使用熊猫:
import pandas as pd
df1 = pd.read_csv("coord1.csv")
df2 = pd.read_csv("coord2.csv")
combined = df1.merge(df2, how='outer').fillna(0)
combined.sort(list(combined.columns[:2]), inplace=True)
combined.to_csv("coord_merged.csv",index=False)
首先,读入原始数据:
>>> import pandas as pd
>>> df1 = pd.read_csv("coord1.csv")
>>> df2 = pd.read_csv("coord2.csv")
>>> df1
x-coordinate y-coordinate data 1 data 2
0 1 10 20 0
1 5 15 1 2
>>> df2
x-coordinate y-coordinate data 3 data 4
0 1 10 7 8
1 3 25 1 2
合并它们,并用零填充缺失的数据:
>>> combined = df1.merge(df2, how='outer')
>>> combined
x-coordinate y-coordinate data 1 data 2 data 3 data 4
0 1 10 20 0 7 8
1 5 15 1 2 NaN NaN
2 3 25 NaN NaN 1 2
>>> combined = df1.merge(df2, how='outer').fillna(0)
>>> combined
x-coordinate y-coordinate data 1 data 2 data 3 data 4
0 1 10 20 0 7 8
1 5 15 1 2 0 0
2 3 25 0 0 1 2
种类:
>>> combined.sort(list(combined.columns[:2]), inplace=True)
>>> combined
x-coordinate y-coordinate data 1 data 2 data 3 data 4
0 1 10 20 0 7 8
2 3 25 0 0 1 2
1 5 15 1 2 0 0
最后写出:
>>> combined.to_csv("coord_merged.csv",index=False)
>>> !cat coord_merged.csv
x-coordinate, y-coordinate, data 1, data 2, data 3, data 4
1.0,10.0,20.0,0.0,7.0,8.0
3.0,25.0,0.0,0.0,1.0,2.0
5.0,15.0,1.0,2.0,0.0,0.0
如果保持整数格式很重要,那么
>>> combined.astype(int).to_csv("coord_merged.csv",index=False)
>>> !cat coord_merged.csv
x-coordinate, y-coordinate, data 1, data 2, data 3, data 4
1,10,20,0,7,8
3,25,0,0,1,2
5,15,1,2,0,0
会做的。