python - 根据关键字段比较两个 CSV 文件：使用 Python 查找修改、新记录和删除

Question

我有两个具有相同字段的 CSV 文件，比如说：

ID,NAME,SURNAME,HOME_ADDRESS,NUMBER_OF_PHONE_LINES,PREFIX,PHONE_NUMBER,EMAIL

我想比较这两个 CSV 文件，并找到：

文件 A 中而不是 B 中的记录（仅基于三个字段进行比较：ID、PREFIX 和 PHONE_NUMBER）
文件 B 中而不是 A 中的记录（始终基于上述字段）
具有相同 ID、PREFIX 和 PHONE_NUMBER 但在其他字段中具有不同信息的记录，例如不同的 EMAIL 或不同的 EMAIL 和 HOME_ADDRESS。

最后，将这些信息拆分为三个不同的文件。

有人知道如何做到这一点吗？

score 0 · Accepted Answer

我会尝试给出一个整体的想法。这没有经过测试，只是描述想法，而不是解决方案！

# open files...
csv_a = f1.readlines()
box_a = [x.split(',') for x in csv_a]
#similar to load list of lists for box_b
box_in_a_not_b = []
box_in_b_not_a = []
box_match_not_perfect = []
while box_a:
    line = box_a.pop()
    flag=0
    while box_b:
        bline = box_b.pop()
        if line[0]=bline[0] and line[6]=bline[6] and line[7]=bline[7] # 6 & 7 being PREFIX and PHONE_NUMBER indexes
           if not all([line[z]==bline[z] for z in range(len(line))]):
               box_match_not_perfect.append(line)
               box_match_not_perfect.append(bline) # keeps both instances in the 3d file
        else:
            box_in_b_not_a.append(bline)
            flag =1 # match not found, so add line from A to the file with unique

    #end of while box_b
    if flag==1:
        box_in_a_not_b.append(line)

#end of while box_a
in_a_not_b = [','.join(z) for z in box_in_a_not_b] # to get list of csv lines
# use another '\n'.join() to get one big multiline string, or write line by lie to the file
# to save box_in_a_not_b, box_in_b_not_a and box_in_match_not_perfect in corresponding files
#...

python - 根据关键字段比较两个 CSV 文件：使用 Python 查找修改、新记录和删除

1 回答 1

Related

Reference