r - R中的FF：'recodeLevels'没有适用的方法

Question

我正在尝试使用 read.csv.ffdf 将一个巨大的（~5GB）.csv 文件加载到 R 中。命令如下：

npi <- read.csv.ffdf(file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv", VERBOSE=TRUE, first.rows=10000,next.rows=100000,colClasses=NA)

该命令运行了一段时间，然后抛出以下错误：“没有适用于 'recodeLevels' 的方法应用于类“c（'double'，'numeric'）的对象。”一些搜索告诉我我需要使用 transFUN选项，但我不知道如何应用它。数据是文本和数字，我认为这可能会导致问题。如果有帮助，我可以上传 csv 的屏幕截图，但在 LibreOffice 中打开需要很长时间。

有谁知道什么技巧？

score 1 · Accepted Answer

从read.csv.ffdf.

transFUN：NULL 或在使用 FUN 读取之后和进一步处理之前在每个 data.frame 块上调用的函数（用于过滤、转换等）

如果您的一列从一个因子变为一个数字，反之亦然，请使用 transFUN 确保它是一个因子

npi <- read.csv.ffdf(
  file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv",
  VERBOSE=TRUE, first.rows=10000,next.rows=100000, 
  transFUN=function(x){
    x$yourcolumnwiththeerror <- factor(as.character(x$yourcolumnwiththeerror))
    x
  })

r - R中的FF：'recodeLevels'没有适用的方法

1 回答 1

Related

Reference