python - 了解 Python 配置文件输出

Question

我正在尝试使用 Python 分析器来加速我的代码。我已经能够确定几乎所有时间都花在了哪个特定功能上，但我无法弄清楚该功能在哪里花费了时间。

下面我有配置文件输出，它显示“appendBallot”是罪魁祸首，消耗了近 116 秒。在下面，我有“appendBallot”的代码。

我无法从配置文件输出中弄清楚，我需要优化“appendBallot”的哪一部分，因为下一个最高时间条目不到一秒。我相信你们中的许多人都可以从我的代码中告诉我，但我想了解如何从配置文件输出中获取该信息。任何帮助将不胜感激。

配置文件输出：

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.000    0.000  116.168  116.168 <string>:1(<module>)
       1    0.001    0.001  116.168  116.168 {execfile}
       1    0.003    0.003  116.167  116.167 foo.py:1(<module>)
       1    0.000    0.000  116.139  116.139 ballots.py:330(loadKnown)
       1    0.000    0.000  116.109  116.109 plugins.py:148(load)
       1    0.196    0.196  116.108  116.108 BltBallotLoader.py:37(loadFile)
  100000  114.937    0.001  115.912    0.001 ballots.py:133(appendBallot)
  100000    0.480    0.000    0.790    0.000 ballots.py:117(newBallot)
  316668    0.227    0.000    0.310    0.000 ballots.py:107(getNumCandidates)
417310/417273    0.111    0.000    0.111    0.000 {len}
  200510    0.071    0.000    0.071    0.000 {method 'append' of 'list' objects}
   99996    0.045    0.000    0.045    0.000 {method 'add' of 'set' objects}
  100000    0.042    0.000    0.042    0.000 {method 'has_key' of 'dict' objects}
       1    0.000    0.000    0.030    0.030 plugins.py:202(getLoaderPluginClasses)
       1    0.000    0.000    0.030    0.030 plugins.py:179(getPluginClasses)
       1    0.000    0.000    0.030    0.030 plugins.py:205(getLoaderPluginClass)
       3    0.016    0.005    0.029    0.010 {__import__}
       1    0.022    0.022    0.025    0.025 ballots.py:1(<module>)
       1    0.010    0.010    0.013    0.013 BltBallotLoader.py:1(<module>)
       7    0.000    0.000    0.003    0.000 re.py:227(_compile)

代码：

  def appendBallot(self, ballot, ballotID=None):
    "Append a ballot to this Ballots object."

    # String representation of ballot for determining whether ballot is unique
    ballotString = str(list(ballot))

    # Ballot as the appropriate array to conserve memory
    ballot = self.newBallot(ballot)

    # Assign a ballot ID if one has not been given
    if ballotID is None:
      ballotID = len(self.ballotIDs)
    assert(ballotID not in self.ballotIDs)
    self.ballotIDs.append(ballotID)

    # Check to see if we have seen this ballot before
    if self.uniqueBallotsLookup.has_key(ballotString):
      i = self.uniqueBallotsLookup[ballotString]
      self.uniqueBallotIDs[i].add(ballotID)
    else:
      i = len(self.uniqueBallots)
      self.uniqueBallotsLookup[ballotString] = i
      self.uniqueBallots.append(ballot)
      self.uniqueBallotIDs.append(set([ballotID]))
    self.ballotOrder.append(i)

score 7 · Accepted Answer

是的，我也遇到了同样的问题。

我知道解决此问题的唯一方法是将大函数包装成几个较小的函数调用。这将允许分析器考虑每个较小的函数调用。

有趣的是，这样做的过程（对我来说，无论如何）让效率低下的地方很明显，所以我什至不必运行分析器。

score 5 · Accepted Answer

我看过你的代码，看起来你做了很多函数调用和属性查找，作为你“检查”的一部分，或者在跳跃之前向前看。您还有很多专门用于跟踪相同条件的代码，即很多代码位都在寻找创建“唯一”ID。

而不是试图为每张选票分配某种唯一的字符串，你不能只使用 ballotID （一个整数吗？）

现在你可以有一个字典 (uniqueBallotIDs) 映射 ballotID 和实际的选票对象。

这个过程可能是这样的：

def appendBallot(self, ballot, ballotID=None):
   if ballotID is None:
       ballotID = self._getuniqueid() # maybe just has a counter? up to you.
   # check to see if we have seen this ballot before.
   if not self._isunique(ballotID):
       # code for non-unique ballot ids.
   else:
       # code for unique ballot ids.

   self.ballotOrder.append(i)

您可以通过使用 defaultdict（来自集合模块）来解决您对字典缺少给定键的一些担忧。收集文档

为完整起见进行编辑，我将包含 defaultdict 的示例用法：

>>> from collections import defaultdict            

>>> ballotIDmap = defaultdict(list)
>>> ballotID, ballot = 1, object() # some nominal ballotID and object.
>>> # I will now try to save my ballotID.
>>> ballotIDmap[ballotID].append(ballot)
>>> ballotIDmap.items()
[(1, [<object object at 0x009BB950>])]

score 5 · Accepted Answer

我会支持 Fragsworth 说您希望将您的功能拆分为更小的功能。

话虽如此，您正在正确读取输出：tottime 是值得关注的。

现在你的减速可能在哪里：

由于似乎有 100000 次调用 appendBallot，并且没有任何明显的循环，我建议它在您的断言中。因为您正在执行：

assert(ballotID not in self.ballotIDs)

这实际上将作为一个循环。因此，第一次调用此函数时，它将遍历一个（可能为空的）数组，然后断言是否找到该值。第 100000 次它将遍历整个数组。

这里实际上有一个可能的错误：如果删除了一张选票，那么添加的下一张选票将与最后添加的一张具有相同的 id（除非那张选票已被删除）。我认为你最好使用一个简单的计数器。这样，您每次添加选票时都可以增加它。或者，您可以使用 UUID 来获取唯一 ID。

或者，如果您正在考虑某种程度的持久性，请使用 ORM，并让它为您生成 ID 并进行唯一检查。

score 5 · Accepted Answer

5

探查器可以是这样的。我使用的方法是这个。它很快就触及了问题的核心。

于 2009-09-24T15:26:50.727 回答

score 4 · Accepted Answer

我在我的代码中使用了这个装饰器，它帮助我完成了我的 pyparsing 调优工作。

score 2 · Accepted Answer

这段代码中有两个问题：

# Assign a ballot ID if one has not been given
if ballotID is None:
    ballotID = len(self.ballotIDs)
assert(ballotID not in self.ballotIDs)
self.ballotIDs.append(ballotID)

首先，self.ballotIDs 似乎是一个列表，因此 assert 语句将导致二次行为。由于您根本没有为您的数据结构提供任何文档，因此不可能是规定性的，但如果出现的顺序无关紧要，您可以使用集合而不是列表。

其次，逻辑（在没有关于 ballotID 的全部内容以及 not-None ballotID arg 的含义的文档的情况下）似乎存在严重错误：

obj.appendBallot(ballota, 2) # self.ballotIDs -> [2]
obj.appendBallot(ballotb)    # self.ballotIDs -> [2, 1]
obj.appendBallot(ballotc)    # wants to add 2 but triggers assertion

其他的建议：

而不是adict.has_key(key), 使用key in adict-- 它更快并且看起来更好。

您可能想考虑查看您的数据结构......它们似乎有点巴洛克式；构建它们可能需要相当多的 CPU 时间。

python - 了解 Python 配置文件输出

6 回答 6

Related

Reference