python - 我的 python 脚本花时间在哪里？我的 cprofile / pstats 跟踪中是否存在“丢失时间”？

Question

我正在尝试分析一个长时间运行的 python 脚本。该脚本使用gdal 模块对栅格 GIS 数据集进行一些空间分析。该脚本当前使用三个文件，循环光栅像素的主脚本称为find_pixel_pairs.py，一个简单的缓存lrucache.py和一些杂项类在utils.py. 我已经在一个中等大小的数据集上分析了代码。 pstats返回：

   p.sort_stats('cumulative').print_stats(20)
   Thu May  6 19:16:50 2010    phes.profile

   355483738 function calls in 11644.421 CPU seconds

   Ordered by: cumulative time
   List reduced from 86 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.008    0.008 11644.421 11644.421 <string>:1(<module>)
        1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes)
340135349  544.143    0.000  572.481    0.000 utils.py:173(extent_iterator)
  8831020   18.492    0.000   18.492    0.000 {range}
   231922    3.414    0.000    8.128    0.000 utils.py:152(get_block_in_bands)
   142739    1.303    0.000    4.173    0.000 utils.py:97(search_extent_rect)
   745181    1.936    0.000    2.500    0.000 find_pixel_pairs.py:40(is_no_data)
   285478    1.801    0.000    2.271    0.000 utils.py:98(intify)
   231922    1.198    0.000    2.013    0.000 utils.py:116(block_to_pixel_extent)
   695766    1.990    0.000    1.990    0.000 lrucache.py:42(get)
  1213166    1.265    0.000    1.265    0.000 {min}
  1031737    1.034    0.000    1.034    0.000 {isinstance}
   142740    0.563    0.000    0.909    0.000 utils.py:122(find_block_extent)
   463844    0.611    0.000    0.611    0.000 utils.py:112(block_to_pixel_coord)
   745274    0.565    0.000    0.565    0.000 {method 'append' of 'list' objects}
   285478    0.346    0.000    0.346    0.000 {max}
   285480    0.346    0.000    0.346    0.000 utils.py:109(pixel_coord_to_block_coord)
      324    0.002    0.000    0.188    0.001 utils.py:27(__init__)
      324    0.016    0.000    0.186    0.001 gdal.py:848(ReadAsArray)
        1    0.000    0.000    0.160    0.160 utils.py:50(__init__)

前两个调用包含主循环 - 整个分析。剩余的调用总和不到 11644 秒中的 625 个。剩下的 11,000 秒用在了哪里？这一切都在的主循环中find_pixel_pairs.py吗？如果是这样，我能找出哪些代码行占用了大部分时间吗？

score 1 · Accepted Answer

你是对的，大部分时间都花在了第phes49 行的函数上find_pixel_pairs.py。要了解更多信息，您需要分解phes为更多子功能，然后重新配置。

score 1 · Accepted Answer

忘记功能和测量。使用这种技术。只需在调试模式下运行它，然后按 ctrl-C 几次。调用堆栈将准确显示哪些代码行对时间负责。

补充：比如暂停10次。如果，正如 EOL 所说，11000 秒中有 10400 秒直接花费在中phes，那么在其中大约 9 次暂停时，它将停在那里。另一方面，如果它在调用 from 的某个子例程中花费了大部分时间phes，那么您不仅会看到它在该子例程中的位置，还会看到调用它的行，这些行也负责时间等等，在调用堆栈上。

不要测量。捕获。

score 0 · Accepted Answer

每个函数或方法的代码执行所花费的时间都在tottime列中。该cumtime方法是tottime在调用的函数中花费的时间。

在您的清单中，您会看到您正在寻找的 11,000 秒直接由phes函数本身花费。它的调用只需要大约 600 秒。

phes因此，您希望按照 ~unutbu 的建议，通过将其分解为子功能并重新分析来找到需要时间的内容。

score 0 · Accepted Answer

如果您在中确定了phes函数/方法中潜在的瓶颈find_pixel_pairs.py，您可以使用line_profiler这些来获取逐行执行配置文件的性能数字（从此处的另一个问题复制）：

Timer unit: 1e-06 s

Total time: 9e-06 s
File: <ipython-input-4-dae73707787c>
Function: do_other_stuff at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_other_stuff(numbers):
     5         1            9      9.0    100.0      s = sum(numbers)

Total time: 0.000694 s
File: <ipython-input-4-dae73707787c>
Function: do_stuff at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def do_stuff(numbers):
     8         1           12     12.0      1.7      do_other_stuff(numbers)
     9         1          208    208.0     30.0      l = [numbers[i]/43 for i in range(len(numbers))]
    10         1          474    474.0     68.3      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

有了这些信息，您就不需要分解phes为多个子函数，因为您可以准确地看到哪些行的执行时间最长。

既然您提到您的脚本运行时间很长，我建议您使用line_profiler尽可能有限的方法，因为虽然分析会增加额外的开销，但行分析会增加更多。

python - 我的 python 脚本花时间在哪里？我的 cprofile / pstats 跟踪中是否存在“丢失时间”？

4 回答 4

Related

Reference