我有一个 Python 程序在 4GB RAM 32 位 12.04 Ubuntu 上运行一些输入数据。该程序的时间和空间复杂度均为 O(n)。当输入数据约为 100 kb 时,它在大约 4 秒内完成执行,峰值 RAM 消耗为 0.5%(在 LINUX 中使用“top”命令)。但是,当我尝试输入大小为 500kB、2.5MB 和 16 MB 的数据时,该过程并未在 1 小时内完成(在每种情况下,我都必须使用 Cntrl C 取消)并且内存消耗停留在 1.6%(即每种情况下大约 64MB)。我可以以某种方式为这个 Python 进程分配更多的 RAM 内存吗?
注意:我正在使用 Python 制作的“mrjob”库在 Python 中实现 Map Reduce 作业。
以下是输入 csv 文件为 100 kB 时成功执行的日志。
ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py as.txt > asop.txtusing configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
Counters from step 1:
(no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-reducer_part-00000
Counters from step 1:
(no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000
Counters from step 2:
(no counters found)
Moving /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000 -> /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output/part-00000
Streaming final output from /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output
removing tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
这是输入 csv 文件为 2.5 MB 时的执行日志和回溯。
ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py matlabsample.csv > matsamop.txt
using configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
Counters from step 1:
(no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-reducer_part-00000
Counters from step 1:
(no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-1-mapper_part-00000
^CTraceback (most recent call last):
File "mt1.py", line 311, in <module>
Motion_Tagging.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 545, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 561, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 631, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/runner.py", line 490, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 193, in _run
combiner_args=combiner_args)
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 488, in _invoke_step
self._wait_for_process(proc_dict, step_num)
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 657, in _wait_for_process
tb_lines = find_python_traceback(stderr_lines)
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/parse.py", line 171, in find_python_traceback
for line in lines:
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 680, in _process_stderr_from_script
for line in stderr:
KeyboardInterrupt