我正在使用 16 GB RAM 的 Windows 10 笔记本电脑上尝试此操作。这里还值得一提的是,我已将 R 的临时文件夹设置在 C: 驱动器外部,这样操作系统驱动器就不会用以下几行在.Renviron
我的文件夹中保存一个文件来耗尽空间:Documents
TMPDIR=D:/rTemp
TMP=D:/rTemp
TEMP=D:/rTemp
当我在 RStudio 中工作时,我已经验证该D:/rTemp
文件夹实际上被用作临时文件夹。
我有一个大约的 gzip 压缩 csv 文件。20 GB,如果未压缩占用大约。83 GB。我尝试disk.frame
使用以下代码为其创建一个:
library(disk.frame) # set temporary directory of R outside C: drive via .Renviron
setup_disk.frame()
options(future.globals.maxSize = Inf)
fyl <- "G:/v_all_country/src/v_all_country_owner.csv.gz"
out <- "G:/v_all_country/src/v_all_country_owner.df"
col_classes_vector <- c(state_cd="factor", off_cd="factor", ... and so on for total 63 columns)
# increase the no. of recommended chunks for reduced RAM usage
no_of_chunks <- recommend_nchunks(file.size(fyl))*5
v_all_country_owner <- csv_to_disk.frame(
fyl,
outdir = out,
overwrite = TRUE,
compress = 100,
nchunks = no_of_chunks,
chunk_reader = "readLines", # documentation warns against data.tabe
colClasses = col_classes_vector
)
不幸的是,我收到如下错误:
Warning in if (is.character(con)) { :
closing unused connection 3 (localhost)
Error in data.table::fread(infile, header = header, ...) :
Opened 83.4GB (89553459056 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.
第一次遇到此错误时,我将临时 R 目录设置到外部操作系统驱动器。但是错误仍在继续,data.table
即使我专门尝试使用readLines
. bigreadr
如果我用作块读取器,则会发生相同的错误。
相同的代码工作得非常好,disk.frame
如果与大约 200 MB 的较小 gzip 压缩文件一起使用,则会创建一个。
然后我尝试使用readr
带有以下代码的后端:
library(disk.frame) # set temporary directory of R outside C: drive via .Renviron
setup_disk.frame()
options(future.globals.maxSize = Inf)
fyl <- "G:/v_all_country/src/v_all_country_owner.csv.gz"
out <- "G:/v_all_country/src/v_all_country_owner.df"
# increase the no. of recommended chunks for reduced RAM usage
no_of_chunks <- recommend_nchunks(file.size(fyl))*5
csv_to_disk.frame(
fyl,
outdir = out,
overwrite = TRUE,
compress = 100,
nchunks = no_of_chunks,
backend = "readr",
chunk_reader = "readLines", # documentation warns against data.table
col_types = cols(state_cd = col_factor(), off_cd = col_factor(), ... and so on for a total of 63 columns)
)
此代码也未能成功创建 adisk.frame
并显示以下错误:
Warning in match(x, table, nomatch = 0L) :
closing unused connection 4 (localhost)
Warning in match(x, table, nomatch = 0L) :
closing unused connection 3 (localhost)
Error: cannot allocate vector of size 64 Kb
Error: cannot allocate vector of size 139 Kb
Error: cannot allocate vector of size 139 Kb
Error: cannot allocate vector of size 139 Kb
由于大小和机密性限制,我无法共享大型 csv 文件。任何人都可以找出给定代码和错误消息的问题吗?任何帮助,将不胜感激。