performance - 慢成对比较

Question

我有一个代码可以打开两个文件，将它们的内容保存到集合（set1 和 set2），并将这些集合之间的成对比较结果保存到输出文件中。这两个文件都非常大（每个文件超过 100K 行），并且这段代码需要很长时间才能输出（超过 10 小时）。

有没有办法优化它的性能？

def matches2smiles():
    with open('file1.txt') as f:
    set1 = {a.rstrip('\n') for a in f}

    with open('file2.txt') as g:
        set2 = {b.replace('\n', '') for b in g}

    with open('output.txt', 'w') as h: 
        r = [                                                                    
            h.write(b + '\n')
            for a in set1
            for b in set2
            if a in b
            ]

matches2smiles()

score 0 · Accepted Answer

您的代码首先是伪造的，它应该是：

    r = [                                                                    
        h.write(a + '\n')
        for a in set1
        if a in set2
        ]

无论如何，使用set1.intersection(set2)- 它可能会更快，更清晰的代码。

performance - 慢成对比较

1 回答 1

Related

Reference