我需要共享已作为 ffdf 对象导入 R 的数据集。 我的目标是能够轻松地将我的 ffdf 数据集导出为 CSV 格式,而不必担心只会增加输出文件大小的 NA 值。
如果我使用的是简单的数据框,我会使用以下语法:
write.csv(df, "C:/path/data.csv", row.names=FALSE, na="")
但是 write.csv.ffdf 函数似乎没有将“na”作为参数。谁能告诉我正确的语法,这样我就不必对输出文件进行后处理来删除 NA 值?
我需要共享已作为 ffdf 对象导入 R 的数据集。 我的目标是能够轻松地将我的 ffdf 数据集导出为 CSV 格式,而不必担心只会增加输出文件大小的 NA 值。
如果我使用的是简单的数据框,我会使用以下语法:
write.csv(df, "C:/path/data.csv", row.names=FALSE, na="")
但是 write.csv.ffdf 函数似乎没有将“na”作为参数。谁能告诉我正确的语法,这样我就不必对输出文件进行后处理来删除 NA 值?
我认为您对write.csv.ffdf
.
require(ff)
# What follows is a minor modification of the first example in the `write.* help page.
> x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=c(NA, 2:26),
dbl=c(1:25,NA) + 0.1, fac=factor(c(letters[2:26], NA)),
ord=c(NA, ordered(LETTERS[2:26])), dct=Sys.time()+1:26,
dat=seq(as.Date("1910/1/1"), length.out=26, by=1))
> ffx <- as.ffdf(x)
> write.csv(ffx, na="")
"","log","int","dbl","fac","ord","dct","dat"
"1",FALSE,,1.1,"b",,2012-12-18 12:18:23,1910-01-01
"2",TRUE,2,2.1,"c",1,2012-12-18 12:18:24,1910-01-02
"3",FALSE,3,3.1,"d",2,2012-12-18 12:18:25,1910-01-03
"4",TRUE,4,4.1,"e",3,2012-12-18 12:18:26,1910-01-04
"5",FALSE,5,5.1,"f",4,2012-12-18 12:18:27,1910-01-05
"6",TRUE,6,6.1,"g",5,2012-12-18 12:18:28,1910-01-06
"7",FALSE,7,7.1,"h",6,2012-12-18 12:18:29,1910-01-07
"8",TRUE,8,8.1,"i",7,2012-12-18 12:18:30,1910-01-08
"9",FALSE,9,9.1,"j",8,2012-12-18 12:18:31,1910-01-09
"10",TRUE,10,10.1,"k",9,2012-12-18 12:18:32,1910-01-10
"11",FALSE,11,11.1,"l",10,2012-12-18 12:18:33,1910-01-11
"12",TRUE,12,12.1,"m",11,2012-12-18 12:18:34,1910-01-12
"13",FALSE,13,13.1,"n",12,2012-12-18 12:18:35,1910-01-13
"14",TRUE,14,14.1,"o",13,2012-12-18 12:18:36,1910-01-14
"15",FALSE,15,15.1,"p",14,2012-12-18 12:18:37,1910-01-15
"16",TRUE,16,16.1,"q",15,2012-12-18 12:18:38,1910-01-16
"17",FALSE,17,17.1,"r",16,2012-12-18 12:18:39,1910-01-17
"18",TRUE,18,18.1,"s",17,2012-12-18 12:18:40,1910-01-18
"19",FALSE,19,19.1,"t",18,2012-12-18 12:18:41,1910-01-19
"20",TRUE,20,20.1,"u",19,2012-12-18 12:18:42,1910-01-20
"21",FALSE,21,21.1,"v",20,2012-12-18 12:18:43,1910-01-21
"22",TRUE,22,22.1,"w",21,2012-12-18 12:18:44,1910-01-22
"23",FALSE,23,23.1,"x",22,2012-12-18 12:18:45,1910-01-23
"24",TRUE,24,24.1,"y",23,2012-12-18 12:18:46,1910-01-24
"25",FALSE,25,25.1,"z",24,2012-12-18 12:18:47,1910-01-25
"26",TRUE,26,,,25,2012-12-18 12:18:48,1910-01-26
如果您的目标是在写入操作期间最小化 RAM 占用,那么首先查看:
getOption("ffbatchbytes")
write.csv.ffdf
没有na
参数,但write.table.ffdf
将na
参数传递给write.table1
它包装的函数。也可以使用sep=","
,你很高兴。
这甚至适用于大的 ff 变量。