我有一个代码可以打开两个文件,将它们的内容保存到集合(set1 和 set2),并将这些集合之间的成对比较结果保存到输出文件中。这两个文件都非常大(每个文件超过 100K 行),并且这段代码需要很长时间才能输出(超过 10 小时)。
有没有办法优化它的性能?
def matches2smiles():
with open('file1.txt') as f:
set1 = {a.rstrip('\n') for a in f}
with open('file2.txt') as g:
set2 = {b.replace('\n', '') for b in g}
with open('output.txt', 'w') as h:
r = [
h.write(b + '\n')
for a in set1
for b in set2
if a in b
]
matches2smiles()