r - 通过 read.big.matrix 在 R 中读取大数据

Question

我正在使用 r 读取维度为 3131875*5 的数据read.big.matrix。我的数据有字符和数字列，包括日期变量。我应该使用的命令是

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

但type = NA在这种情况下，R 不接受，我收到一个错误：

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

我需要知道type这里应该是什么。我尝试了类似的选项，double但这给了我同样的错误。

请帮我。

score 3 · Accepted Answer

来自?read.big.matrix：

文件必须只包含一种原子类型（例如，所有整数）。

因此，您将无法读取包含字符、数字、整数、日期等组合的数据。您可以在文件上做一些工作，例如使用不同的程序将字符变量转换为整数表示（如转换为 R 中的一个因子）。

编辑：

在bigmemory 网站上，有一个使用 python 脚本将字符信息更改为整数的预处理数据示例。该脚本是为特定数据集编写的，但也许您可以将其用作数据的指南。

r - 通过 read.big.matrix 在 R 中读取大数据

1 回答 1

Related

Reference