r - 函数 ff:read.csv.ffdf（ff - R 包）中 colClasses 参数的当前状态

Question

由于以下代码vmode 'character' not implemented中的参数而发生错误：colClasses=c("id"="character")

df <- read.csv.ffdf('TenGBsample.csv',
      colClasses=c("id"="character"), VERBOSE=TRUE)

read.table.ffdf 1..1000 (1000) csv-read=0.02secError in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, :
vmode 'character' 未实现

其中第一列TenGBsample.csv是“id”，由 30 位数字组成，超过了我的 64 位系统（Windows）上的最大数字，我想将它们作为字符处理，第二列包含小数字，所以不需要调整。

我已经检查过，并且有“字符”模式：http vmode: //127.0.0.1 :16624/library/ff/html/vmode.html

score 1 · Accepted Answer

请注意以下内容help(read.csv.ffdf)

...read.table.ffdf被设计为尽可能地表现得像read.table。但是，请注意以下差异：

不支持字符向量，字符数据必须读取为以下 colClass 之一：'Date'、'POSIXct'、'factor、'ordered'。默认情况下，字符列被读取为因子。因此，不允许使用参数“as.is”和“stringsAsFactors”。

因此，您无法将值读取为字符。但是，如果文件中的列已经有数值id，那么您可以将它们读入双精度并在之后重新格式化它们。 format(x, scientific = FALSE)将以标准符号打印x。

这是一个示例数据集x，其中id数字为 30 位。

library(ff)

x <- data.frame(
    id = (267^12 + (102:106)^12),  
    other = paste0(LETTERS[1:5],letters[1:5])
)
## create a csv file with 'x'
csvfile <- tempPathFile(path = getOption("fftempdir"), extension = "csv")
write.csv(
    format(x, scientific = FALSE), 
    file = csvfile, row.names = FALSE, quote = 2
)    
## read in the data without colClasses
ffx <- read.csv.ffdf(file = csvfile)
vmode(ffx)
#       id     other 
# "double" "integer"

现在我们可以强制ffx使用该列data.frame并ffx[,]重新格式化该id列。

df <- within(ffx[,], id <- format(id, scientific = FALSE))
class(df$id)
# [1] "character"
df
#                               id other
# 1 131262095302921040298042720256    Aa
# 2 131262252822013319483345600512    Bb
# 3 131262428093345052649582493696    Cc
# 4 131262622917452503293152460800    Dd
# 5 131262839257598318815163187200    Ee

r - 函数 ff:read.csv.ffdf（ff - R 包）中 colClasses 参数的当前状态

1 回答 1

Related

Reference