4

我有一个 Python 程序在 4GB RAM 32 位 12.04 Ubuntu 上运行一些输入数据。该程序的时间和空间复杂度均为 O(n)。当输入数据约为 100 kb 时,它在大约 4 秒内完成执行,峰值 RAM 消耗为 0.5%(在 LINUX 中使用“top”命令)。但是,当我尝试输入大小为 500kB、2.5MB 和 16 MB 的数据时,该过程并未在 1 小时内完成(在每种情况下,我都必须使用 Cntrl C 取消)并且内存消耗停留在 1.6%(即每种情况下大约 64MB)。我可以以某种方式为这个 Python 进程分配更多的 RAM 内存吗?

注意:我正在使用 Python 制作的“mrjob”库在 Python 中实现 Map Reduce 作业。

以下是输入 csv 文件为 100 kB 时成功执行的日志。

   ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py as.txt > asop.txtusing configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
Moving /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000 -> /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output/part-00000
Streaming final output from /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output
removing tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269

这是输入 csv 文件为 2.5 MB 时的执行日志和回溯。

ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py matlabsample.csv > matsamop.txt
using configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-1-mapper_part-00000
^CTraceback (most recent call last):


  File "mt1.py", line 311, in <module>
    Motion_Tagging.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 545, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 561, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 631, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/runner.py", line 490, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 193, in _run
    combiner_args=combiner_args)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 488, in _invoke_step
    self._wait_for_process(proc_dict, step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 657, in _wait_for_process
    tb_lines = find_python_traceback(stderr_lines)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/parse.py", line 171, in find_python_traceback
    for line in lines:
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 680, in _process_stderr_from_script
    for line in stderr:
KeyboardInterrupt
4

2 回答 2

1

您不会“为 Python 进程分配内存”,而是在 Python 程序中使用更大的结构。从根本上讲,您的算法可能存在缺陷,以至于它没有利用可用的内存。

于 2012-12-24T09:57:33.217 回答
0

仅供参考,这不是代码级解决方案。但是,您可以通过下面的链接深入了解 python 内存实现是如何工作的,以及问题是如何解决的。它还讨论了可以改进 Python 内存管理的其他领域。希望它会有用。

http://www.evanjones.ca/memoryallocator/

于 2012-12-24T10:11:27.193 回答