python - Python - 删除具有特定键的重复行

Question

我在文本文件（示例片段）中有大约 500k 行，如下所示 -

1,Party-120273.gif,16256,23ss423
2,Party-120275.gif,16456,23423
3,Party-120273.gif,12656,232423
4,Party-120273.gif,165236,2312423
5,Party-120276.gif,165236,2312423

如何根据第二个值列删除文件中重复出现的行。例如，在上面的行中，删除重复出现的包含Party-120273.gif的行。第一次出现的一个应该保留不删除。因此输出应该是 -

1,Party-120273.gif,16256,23ss423
2,Party-120275.gif,16456,23423
5,Party-120276.gif,165236,2312423

我必须对整个文件执行此操作，并删除第二列中具有重复值的重复行。我将如何在 python 中执行此操作？

score 4 · Accepted Answer

它必须是Python吗？为什么不使用sort(1)：

sort --field-separator=, --key=2,2 --unique < file

如果您仍想在 Python 中执行此操作，请查看csv模块以解析行：

seenKeys = set()
for row in reader:
    if row[1] in seenKeys: continue

    seenKeys.add( row[1] )
    print ', '.join(row)

python - Python - 删除具有特定键的重复行

1 回答 1

Related

Reference