0

我想加载 5.9 GB CSV,但我不使用 pandas 库。我有 4 个 GPU。我使用rapids.ai更快地加载这个大型数据集,但每次我尝试时,都会向我显示这个错误,尽管我的其他 GPU 内存中有空间。一开始GPU的内存使用情况是:

GPU 0
total    : 11554717696
free     : 11126046720
used     : 428670976
GPU 1
total    : 11554717696
free     : 11542331392
used     : 12386304
GPU 2
total    : 11554717696
free     : 11542331392
used     : 12386304
GPU 3
total    : 11551440896
free     : 11113070592
used     : 438370304

代码是:

import cudf
import pandas as pd
import time
import subprocess as sp
import os
import dask_cudf

name = 'T100'
path = '/media/mo/2438a3d1-29fe-4c6f-aafb-f906acd5140d/AIMD/c1/trajs/'+name+'.CSV'
start = time.time()


data = dask_cudf.from_cudf(cudf.read_csv(path),
                         npartitions=4).compute()
done = time.time()
elapsed = done - start
print(elapsed)

提示:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-3-1fff5fb4e9b4> in <module>
      2 
      3 
----> 4 data = dask_cudf.from_cudf(cudf.read_csv(path),
      5                          npartitions=4).compute()
      6 done = time.time()

~/anaconda3/envs/machineLearning/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

~/anaconda3/envs/machineLearning/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
     82         na_filter=na_filter,
     83         prefix=prefix,
---> 84         index_col=index_col,
     85     )
     86 

cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorMemoryAllocation out of memory
4

2 回答 2

2

问题答案:CUDF错误处理大量parquet文件

解释如何使用 dask_cudf 读取大文件:https ://stackoverflow.com/a/58123478/13887495

按照答案中提供的说明应该可以帮助您解决MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorMemoryAllocation out of memory

于 2020-08-26T13:51:41.663 回答
0

代码应该是

data = dask_cudf.read_csv(path,
                         npartitions=4)
于 2020-08-26T13:06:30.570 回答