我有一个包含许多不同值的字符变量的大数据集。我正在尝试读取数据big.matrix
,然后用于biglm.big.matrix
构建线性模型。但是,因为big.matrix
会将所有字符向量转换为因子,并且字符标签将丢失。我决定在 R 之外为我的字符列创建一个查找表,并使用数字来表示 R 的不同级别。但是,我不知道如何告诉big.matrix
这些列应该被视为因素而不是数字。请帮忙。
问问题
163 次
1 回答
0
我不是很熟悉,read.table.ffdf
但你能用它的x
论点吗?来自?read.table.ffdf
:
x
NULL or an optional ffdf object to which the read records are appended. If this is provided,
it defines crucial features that are otherwise determnined during the 'first' chunk of
reading: vmodes, colnames, colClasses, sequence of predefined levels. In order to also read
the first chunk into such predefined ffdf, an x with 1 row is treated special: instead of
appending the first row will be overwritten. This is necessary because we cannot provide x
with zero rows (we cannot create ff vectors with zero elements).
您可以将相应的列定义x
为具有给定水平的因子,然后将其用作模板。
于 2013-05-07T23:12:05.277 回答