这是一些示例数据
劳动报告.csv
DJR Number,Date,Foreman,Labour Name,Start,Finish,Hours Worked
1,1/01/2020,Emersen,Tai,6:30,15:00,8:30
1,1/01/2020,Emersen,Rachel,6:30,15:00,8:30
1,1/01/2020,Emersen,Andrew,6:30,15:00,8:30
1,1/01/2020,Emersen,Liam,6:30,15:00,8:30
2,2/01/2020,Scott,Rachel,6:30,15:00,8:30
3,3/01/2020,Michael,Andrew,6:30,15:00,8:30
植物报告.csv
DJR Number,Date,Foreman,Plant,Plant Hours
1,1/01/2020,Emersen,1,5
2,1/01/2020,Scott,2,5
3,1/01/2020,Michael,2,5
可以使用下面的代码。逻辑是添加一个额外的列来标记该行是否需要连接。然后我们可以对 pandas 进行左合并,然后在合并后删除这个额外的列
import pandas as pd
import numpy as np
labour_report = pd.read_csv('labour-report.csv')
plant_report = pd.read_csv('plant-report.csv')
should_join_row_lbr=[]
foremen = {}
for index, labour_report_row in labour_report.iterrows():
foreman = labour_report_row['Foreman']
if foreman in foremen:
should_join_row_lbr.append(0) # Dont join this row
else:
foremen[foreman]=1
# Join this row as this is the first record for the foreman
should_join_row_lbr.append(1)
labour_report['Should Join Row']=should_join_row_lbr
should_join_row_plnt = [1]*plant_report['DJR Number'].count()
plant_report['Should Join Row']=should_join_row_plnt
# Do a left join with Should Join column as well,
# Only the records that have value 1 will be joined from labour_report
combined_report = pd.merge(labour_report,plant_report,on=['Foreman','Should Join Row'],how='left')
combined_report = combined_report.drop(columns=['Should Join Row'])
combined_report = combined_report.replace(np.nan, '', regex=True)
print(combined_report)
combined_report.to_csv('combined_report.csv')
应该产生结果
DJR Number_x Date_x Foreman Labour Name Start Finish Hours Worked DJR Number_y Date_y Plant Plant Hours
0 1 1/01/2020 Emersen Tai 6:30 15:00 8:30 1 1/01/2020 1 5
1 1 1/01/2020 Emersen Rachel 6:30 15:00 8:30
2 1 1/01/2020 Emersen Andrew 6:30 15:00 8:30
3 1 1/01/2020 Emersen Liam 6:30 15:00 8:30
4 2 2/01/2020 Scott Rachel 6:30 15:00 8:30 2 1/01/2020 2 5
5 3 3/01/2020 Michael Andrew 6:30 15:00 8:30 3 1/01/2020 2 5