python - 速度测试导致奇怪的行为。在一个实例中将花费的时间乘以 100，在另一个实例中只乘以 10

Question

我正在使用三个函数进行速度测试，readFile、prepDict 和 test。测试只是 prepDict(readFile)。然后我用 timeit 模块运行了很多次。

当我将循环数增加 10 倍时，函数 prepDict 需要大约 100 倍的时间，但是使用函数 prepDict 的函数测试仅增加 10。

这是功能和测试。

def readFile(filepath):
    tempDict = {}
    file = open(filepath,'rb')
    for line in file:
        split = line.split('\t')
        tempDict[split[1]] = split[2]
    return tempDict

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    return tempDict

def test():
    prepDict(readFile('two.txt'))

if __name__=='__main__':
    from timeit import Timer
    t = Timer(lambda: readFile('two.txt'))
    print 'readFile(10000): ' + str(t.timeit(number=10000))

    tempDict = readFile('two.txt')
    t = Timer(lambda: prepDict(tempDict))
    print 'prepDict (10000): ' + str(t.timeit(number=10000))

    t = Timer(lambda: test())
    print 'prepDict(readFile) (10000): ' + str(t.timeit(number=10000))

    t = Timer(lambda: readFile('two.txt'))
    print 'readFile(100000): ' + str(t.timeit(number=100000))

    tempDict = readFile('two.txt')
    t = Timer(lambda: prepDict(tempDict))
    print 'prepDict (100000): ' + str(t.timeit(number=100000))

    t = Timer(lambda: test())
    print 'prepDict(readFile) (100000): ' + str(t.timeit(number=100000))

我得到的结果如下：

readFile(10000): 0.61602914474
prepDict (10000): 0.200615847469
prepDict(readFile) (10000): 0.609288647286
readFile(100000): 5.91858320729
prepDict (100000): 18.8842101717
prepDict(readFile) (100000): 6.45040039665

如果我多次运行它，我会得到类似的结果。为什么 prepDict 增加了约 100 倍，而 prepDict(readFile) 仅增加了 10 倍，即使它使用的是 prepDict 函数？

two.txt 是一个带有这些数据点的表格分隔文件：

Item    Title   Hello2
Item    Desc    Testing1232
Item    Release 2011-02-03

score 3 · Accepted Answer

这里的问题是您的prepDict函数扩展了输入。每次按顺序调用它，它就有更多的数据要处理。并且该数据呈线性增长，因此第 10000 次运行所需的时间大约是第一次运行的 10000 倍。*

当你打电话时test，它每次都会创建一个新的字典，所以时间是恒定的。

prepDict您可以通过更改测试以每次在新副本上运行 dict很容易地看到这一点：

t = Timer(lambda: prepDict(tempDict.copy()))

顺便说一句，您prepDict实际上并没有呈指数增长** number，只是呈二次方。一般来说，当某些东西超线性增长时，你想估计算法成本，你真的需要得到两个以上的数据点。

* 这并不完全正确——它只有在字符串和散列操作（线性增长）所花费的时间开始超过所有其他操作（都是恒定的）所花费的时间时才开始线性增长。

** 你在这里没有提到任何关于指数增长的事情，但是在你之前的问题中你提到了，所以你可能在你的实际问题中做出了同样的无根据的假设。

score 1 · Accepted Answer

您的电话prepDict不会发生在孤立的环境中。每次调用prepDict修改tempDict- 键每次都会变长一点。因此，在 10**5 次调用之后，prepDict输入的键prepDict是相当大的字符串。如果您将打印语句放入prepDict：

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    print(tempDict)
    return tempDict

解决这个问题的方法是确保每次调用prepDict——或者更一般地说，你正在计时的语句——不会影响你正在计时的下一个调用（或语句）。abarnert 已经展示了解决方案：prepDict(tempDict.copy()).

顺便说一句，您可以使用 afor-loop来减少代码重复：

import timeit
import collections    

if __name__=='__main__':
    Ns = [10**4, 10**5]
    timing = collections.defaultdict(list)
    for N in Ns:
        timing['readFile'].append(timeit.timeit(
            "readFile('two.txt')",
            "from __main__ import readFile",
            number = N))
        timing['prepDict'].append(timeit.timeit(
            "prepDict(tempDict.copy())",
            "from __main__ import readFile, prepDict; tempDict = readFile('two.txt')",
            number = N))
        timing['test'].append(timeit.timeit(
            "test()",
            "from __main__ import test",
            number = N))

    print('{k:10}: {N[0]:7} {N[1]:7} {r}'.format(k='key', N=Ns, r='ratio'))
    for key, t in timing.iteritems():
        print('{k:10}: {t[0]:0.5f} {t[1]:0.5f} {r:>5.2f}'.format(k=key, t=t, r=t[1]/t[0]))

产生时间，例如

key       :   10000  100000 ratio
test      : 0.11320 1.12601  9.95
prepDict  : 0.01604 0.16167 10.08
readFile  : 0.08977 0.91053 10.14

score 0 · Accepted Answer

发生这种情况是因为您在测试时重用tempDict了所有调用。由于循环遍历它给出的字典中的所有项目，然后基本上只是将每个字符串键的长度增加一个，最终你会得到一堆非常长的键。随着它的进展，这开始减慢您的功能，因为字符串连接操作正在使用/重新创建越来越大的字符串。prepDictprepDictprepDict

这不是问题，test因为您每次都重新初始化字典。

python - 速度测试导致奇怪的行为。在一个实例中将花费的时间乘以 100，在另一个实例中只乘以 10

3 回答 3

Related

Reference