-4

我感觉这段 Python 代码可以大大缩短,但我几乎总是倾向于回退到编写 C 样式的布局。您认为缩短它的最佳方法是什么?可读性是一种奖励,而不是要求。

def compfiles(file1, file2):
    linecnt = 0
    for line1 in open(file1):
        line1 = line1.strip()
        hit = False
        for line2 in open(file2):
            line2 = line2.strip()
            if line2 == line1:
                hit = True
                break
        if not hit:
            print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2))
        linecnt += 1
    print("%i lines compared between %s and %s." % (linecnt, file1, file2))

fn = ["file1.txt", "file2.txt"]
compfiles(fn[0], fn[1])
compfiles(fn[1], fn[0])
4

2 回答 2

2

您的代码效率极低,因为open循环中的第二个文件遍历了第一个文件。只需将第二个文件读入列表(或者更好的是,set它可以为您提供平均O(1)查找时间)并使用in运算符。此外,您的linecnt变量只计算 file1 中的行数 - 您可以将这些行读入列表并调用len此列表以获得相同的数字:

def compfiles(file1, file2):
    lines1 = [l.strip() for l in open(file1).read().split("\n")]
    lines2 = set([l.strip() for l in open(file2).read().split("\n")])
    for line in lines1:
        if not line in lines2:
            print("Miss: file %s contains '%s', but file %s does not!" % (file1, line, file2))
    print("%i lines compared between %s and %s." % (len(lines1), file1, file2))
于 2013-06-19T12:39:24.850 回答
1
def compfiles(file1, file2):
    with open(file1) as fin:
        set1 = set(fin)
    with open(file2) as fin:
        set2 = set(fin)
    ... # do some set operations

如果文件有重复的行或顺序很重要,请遍历 file1

def compfiles(file1, file2):
    with open(file2) as fin:
        set2 = set(fin)
    with open(file1) as fin:
        for i, line in enumerate(fin):
            if line not in set2:
                print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2))           
        print("%i lines compared between %s and %s." % (i+1, file1, file2))
于 2013-06-19T12:42:31.783 回答