1

当我使用cupy处理一些大数组时,出现了内存不足的错误,但是当我检查nvidia-smi查看内存使用情况时,它没有达到我的GPU内存的限制,我正在使用nvidia geforce RTX 2060,GPU 内存为 6 GB,这是我的代码:

import cupy as cp

mempool = cp.get_default_memory_pool()
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 0

a = cp.random.randint(0, 256, (10980, 10980)).astype(cp.uint8)
a = a.ravel()
print(a.nbytes)                          # 120560400
print(mempool.used_bytes())              # 120560640
print(mempool.total_bytes())             # 602803712
# when I finish create this array, the nvidia-smi shows like this
#+-----------------------------------------------------------------------------+
 | NVIDIA-SMI 430.86       Driver Version: 430.86       CUDA Version: 10.2     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  GeForce RTX 2060   WDDM  | 00000000:01:00.0  On |                  N/A |
 | N/A   46C    P8     9W /  N/A |   1280MiB /  6144MiB |      1%      Default |
 +-------------------------------+----------------------+----------------------+

# but then I run this command, and error cames out
s_values, s_idx, s_counts = cp.unique(
    a, return_inverse=True, return_counts=True)
# and the error shows
# cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 964483584 bytes (total 5545867264 bytes)
# the nvidia-smi shows
# +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 430.86       Driver Version: 430.86       CUDA Version: 10.2     |
  |-------------------------------+----------------------+----------------------+
  | GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |===============================+======================+======================|
  |   0  GeForce RTX 2060   WDDM  | 00000000:01:00.0  On |                  N/A |
  | N/A   45C    P8     9W /  N/A |   5075MiB /  6144MiB |      3%      Default |
  +-------------------------------+----------------------+----------------------+

似乎有足够的空间可以使用,为什么会发生这个错误,这是因为我的 GPU 没有足够的内存,还是因为我的代码错误或我没有正确分配内存。

4

2 回答 2

1

964,483,584 不是比你mempool.total_bytes()的 602,803,712 大吗?

正如评论中所说,您可以分批完成,而不是一次完成整个计算。

于 2019-10-28T04:18:09.500 回答
0

您可以使用 dask 执行与代表您执行并行化相同的操作,即使数据不适合 RAM,您也永远不会真正耗尽内存。我附上了作者本人提供如何做的解释的链接。

from dask.distributed import  Client,LocalCluster
import dask.array as da
import numpy as np

cluster = LocalCluster() #using multiple CPUs in the machine/cluster
client = Client(cluster)
client
rs = da.random.RandomState(RandomState=np.random.RandomState)
x = rs.random((100000,40000),chunks=(10000,400)) #29.80GB of ndarray
x #just ensure that the chunk size is small #30.52MB chunk
da.exp(x).mean().compute() #do not try to return ndarray with element-wise transformation, instead always try to get the reduced form. 
da.exp(x) # Do not run this line as it will lead to  

在最后一行中,dask 尝试将输出保存在内存中。由于输出是 29+GB 的顺序,您将耗尽内存。 dask 的作者对上述代码的解释的 Youtube 链接

于 2021-06-21T20:24:58.893 回答