1

我正在使用 theano 实现一些深度学习算法。在我停止一些运行theano的程序后,如果我想再次导入theano,偶尔会出现以下错误。

    >>> import theano
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/__init__.py", line 118, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 40, in test_nvidia_driver1
    if not numpy.allclose(f(), a.sum()):
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 875, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/gof/link.py", line 317, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 862, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: kernel_reduce_ccontig_node_4894639462a290346189bb38dab7bb7e_0: out of memory. (grid: 1 x 1; block: 256 x 1 x 1)

Apply node that caused the error: GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)
Toposort index: 0
Inputs types: [CudaNdarrayType(float32, vector)]
Inputs shapes: [(10000,)]
Inputs strides: [(1,)]
Inputs values: ['not shown']
Outputs clients: [[HostFromGpu(GpuCAReduce{add}{1}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

我搜索了几种解决方案。有人建议通过 rm -rf ./theano 删除编译文件夹。我还检查了 ./theano 的所有者是否不是 root 用户。我也尝试将我的 ./theanorc 设置如下。但两者都不适合我。

[global]
floatX = float32
device = cpu
optimizer=fast_run

[lib]
cnmem = 0.1

[cuda]
root = /usr/local/cuda

唯一可行的解​​决方案是重新启动或注销机器。这很尴尬。我不知道是什么导致了这个问题。任何人都可以提出一些解决方案吗?

4

0 回答 0