0

我正在使用 16 GB RAM 的 Windows 10 笔记本电脑上尝试此操作。这里还值得一提的是,我已将 R 的临时文件夹设置在 C: 驱动器外部,这样操作系统驱动器就不会用以下几行在.Renviron我的文件夹中保存一个文件来耗尽空间:Documents

TMPDIR=D:/rTemp 
TMP=D:/rTemp 
TEMP=D:/rTemp

当我在 RStudio 中工作时,我已经验证该D:/rTemp文件夹实际上被用作临时文件夹。

我有一个大约的 gzip 压缩 csv 文件。20 GB,如果未压缩占用大约。83 GB。我尝试disk.frame使用以下代码为其创建一个:

library(disk.frame) # set temporary directory of R outside C: drive via .Renviron
setup_disk.frame()
options(future.globals.maxSize = Inf)

fyl <- "G:/v_all_country/src/v_all_country_owner.csv.gz"
out <- "G:/v_all_country/src/v_all_country_owner.df"

col_classes_vector <- c(state_cd="factor", off_cd="factor", ... and so on for total 63 columns)

# increase the no. of recommended chunks for reduced RAM usage
no_of_chunks <- recommend_nchunks(file.size(fyl))*5

v_all_country_owner <- csv_to_disk.frame(
  fyl,
  outdir = out,
  overwrite = TRUE,
  compress = 100,
  nchunks = no_of_chunks,
  chunk_reader = "readLines", # documentation warns against data.tabe
  colClasses = col_classes_vector
)

不幸的是,我收到如下错误:

Warning in if (is.character(con)) { :
  closing unused connection 3 (localhost)
Error in data.table::fread(infile, header = header, ...) : 
  Opened 83.4GB (89553459056 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

第一次遇到此错误时,我将临时 R 目录设置到外部操作系统驱动器。但是错误仍在继续,data.table即使我专门尝试使用readLines. bigreadr如果我用作块读取器,则会发生相同的错误。

相同的代码工作得非常好,disk.frame如果与大约 200 MB 的较小 gzip 压缩文件一起使用,则会创建一个。

然后我尝试使用readr带有以下代码的后端:

library(disk.frame) # set temporary directory of R outside C: drive via .Renviron
setup_disk.frame()
options(future.globals.maxSize = Inf)

fyl <- "G:/v_all_country/src/v_all_country_owner.csv.gz"
out <- "G:/v_all_country/src/v_all_country_owner.df"

# increase the no. of recommended chunks for reduced RAM usage
no_of_chunks <- recommend_nchunks(file.size(fyl))*5

csv_to_disk.frame(
  fyl,
  outdir = out,
  overwrite = TRUE,
  compress = 100,
  nchunks = no_of_chunks,
  backend = "readr",
  chunk_reader = "readLines", # documentation warns against data.table
  col_types = cols(state_cd = col_factor(), off_cd = col_factor(), ... and so on for a total of 63 columns)
)

此代码也未能成功创建 adisk.frame并显示以下错误:

Warning in match(x, table, nomatch = 0L) :
  closing unused connection 4 (localhost)
Warning in match(x, table, nomatch = 0L) :
  closing unused connection 3 (localhost)
Error: cannot allocate vector of size 64 Kb
Error: cannot allocate vector of size 139 Kb
Error: cannot allocate vector of size 139 Kb
Error: cannot allocate vector of size 139 Kb

由于大小和机密性限制,我无法共享大型 csv 文件。任何人都可以找出给定代码和错误消息的问题吗?任何帮助,将不胜感激。

4

0 回答 0