python - Python - 尝试比较两个格式不同的文本文件

Question

文件 1 的格式如下：

1111111111
2222222222

文件 2 的格式如下：

3333333333:4444444444
1111111111:2222222222

我正在尝试找出一种方法来获取文件一中的内容，并查看它是否仅与文件二中冒号右侧的内容匹配。最终目标是如果有匹配项，则删除文件二中的 FULL 行。

我知道我可以使用标准命令剪切文件 2，因此它们的格式完全相同。问题是我需要 88888:99999 格式的完成文件，将它们分开似乎太复杂了，只是为了把它们放回正确的顺序。

我试过嵌套循环、正则表达式、集合、列表，我的头在旋转。

我希望这是有道理的。提前致谢。

Traceback (most recent call last):
 File "test.py", line 17, in <module>
   if line.split(":")[1] in keys:
IndexError: list index out of range

score 3 · Accepted Answer

假设您想要删除文件 2 中的行，如果该行的第二部分与文件 1 中的任何值匹配，您将执行以下操作：

# Warning: Untested code ahead
with open("file1", "r") as f1:
    # First, get the set of all the values in file 1
    # Sets use hash tables under the covers so this should
    # be fast enough for our use case (assuming sizes less than
    # the total memory available on the system)
    keys = set(f1.read().splitlines())

# Since we can't write back into the same file as we read through it
# we'll pipe the valid lines into a new file
with open("file2", "r") as f2:
    with open("filtered_file", "w") as dest:
        for line in f2:
            line = line.strip()  # Remove newline
            # ASSUMPTION: All lines in file 2 have a colon
            if line.split(":")[1] in keys:
                continue
            else:
                dest.writeline(line)

score 0 · Accepted Answer

这就是你如何让文件 2 中的冒号正确的元素。也许不是最干净的，但你明白了。

 str2 = open(file2).read()
 righttocolon = [s.split(":")[1] for s in [ln for ln in str2.split("\n")] if len(s.split(":")) == 2]

python - Python - 尝试比较两个格式不同的文本文件

2 回答 2

Related

Reference