我正在使用 Cupy 取得进展,并且能够将以前使用 numpy 和 C++ 的迭代图像重建速度提高大约 3 倍。现在我遇到了一个间歇性问题,指出我似乎错过了一些重要的事情。
我正在使用 NVIDIA GeForce GTX 1080 Ti 的 Mac Pro 2013 年末 OSX 10.13.6 上运行它。
我受到内存的限制,并继续使用 cupy.asarray() 将数据加载到 GPU,然后通过将变量设置为 None 来删除它。这有效,但我间歇性地得到: cupy.cuda.memory.OutOfMemoryError
我在迭代中运行以下循环:
# iterations
for i in range(nr_iterations):
[...]
# loop within iterations
for idx in range(dim2):
print("GPU memory info - iteration: " + str(i) + " - used: " + str(mempool.used_bytes()) + ", total: " + str(mempool.total_bytes()) + " pinned :" + str(pinned_mempool.n_free_blocks()) + " in loop with idx = " + str(idx) )
cupy_array = cp.asarray( cpp_function(numpy_array[:,flow_idx:flow_idx+1,...]) )
# do all the work
[...]
# last line in for loop
cupy_array = None
打印输出显示稳定的内存使用情况,直到突然在cupy_array = cp.asarray()行抛出错误。以下是日志记录语句的摘录:
GPU memory info - iteration: 0 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 0
GPU memory info - iteration: 0 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 1
GPU memory info - iteration: 1 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 0
GPU memory info - iteration: 1 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 1
GPU memory info - iteration: 2 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 0
GPU memory info - iteration: 2 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 1
GPU memory info - iteration: 3 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 0
GPU memory info - iteration: 3 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 1
GPU memory info - iteration: 4 - used: 3477175296, total: 8529552896 pinned :2 in loop with idx = 0
File "cupy/core/core.pyx", line 1712, in cupy.core.core.array
File "cupy/core/core.pyx", line 1751, in cupy.core.core.array
File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
File "cupy/cuda/memory.pyx", line 518, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1085, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1106, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 934, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 949, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 697, in cupy.cuda.memory._try_malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 2158119936 bytes (total 10687672832 bytes)
我特别惊讶的是,在发生错误的情况下,系统似乎需要 10687672832 字节,其中日志记录表明使用了 3477175296 字节,同时分配了 2158119936 字节。虽然我意识到我需要两倍的可用内存来加载一个数组,但在这种情况下,它实际上需要加载的数组大小的 3 倍以上。
有什么明显的我失踪了吗?谢谢你看这个。