我使用 colab 并使用一个巨大的 csv 文件(3GB,17540000 条记录)运行 cudf.read_csv(),但结果是错误的。
import cudf
import numpy as np
import pandas as pd
import csv
g_df = cudf.read_csv('drive/MyDrive/m1.csv',escapechar="\\")
错误信息是
RuntimeError Traceback (most recent call last)
<ipython-input-9-efc4c69ac697> in <module>()
----> 1 g_df = cudf.read_csv('drive/MyDrive/m1.csv',escapechar="\\")
2 g_df.shape
1 frames
/usr/local/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
100 na_filter=na_filter,
101 prefix=prefix,
--> 102 index_col=index_col,
103 )
104
cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
RuntimeError: cuDF failure at: ../include/cudf/strings/detail/strings_column_factories.cuh:75: total size of strings is too large for cudf column