python - 如何比较具有相同列但不同值的两个 csv 文件？

Question

这是我的问题，我需要比较两个转换为 CSV 文件的 procmon 扫描。

这两个文件具有相同的列名，但显然内容不同。如果有相应的匹配项，我需要检查从第一个文件到第二个文件的“路径”（第 5 列），并将第二个文件的整行打印到第三个 CSV 中。

我已经用谷歌搜索了很长一段时间，似乎无法让它像我想要的那样工作，感谢任何帮助！

我尝试了许多在线工具和其他 python 脚本，但无济于事。

score 2 · Accepted Answer

只需为这些事情编写您自己的代码。这可能比您预期的要容易。

#!/usr/bin/env python

import pandas as pd

# read the csv files
csv1 = pd.read_csv('<first_filename>')
csv2 = pd.read_csv('<sencond_filename>')

# create a comapare series of the files
iseq = csv1['Path'] == csv2['Path']

# push compared data with 'True' from csv2 to csv3
csv3 = pd.DataFrame(csv2[iseq])

# write to a new csv file
csv3.to_csv('<new_filename>')

score 2 · Accepted Answer

您是否尝试过将 pandas 和 numpy 一起使用？

它看起来像这样：

import pandas as pd
import numpy as np

#get your second file as a Dataframe, since you need the whole rows later
file2 = pd.read_csv("file2.csv")

#get your columns to compare
file1Column5 = pd.read_csv("file1.csv")["name of column 5"]
file2Column5 = file2["name of column 5"]

#add a column where if values match, row marked True, else False
file2["ColumnsMatch"] = np.where(file1Column5 == file2Column5, 'True', 'False')

#filter rows based on that column and remove the extra column
file2 = file2[file2['ColumnsMatch'] == 'True'].drop('ColumnsMatch', 1)

#write to new file
file2.to_csv(r'file3.csv')

python - 如何比较具有相同列但不同值的两个 csv 文件？

2 回答 2

Related

Reference