-1

我有一个包含许多不同值的字符变量的大数据集。我正在尝试读取数据big.matrix,然后用于biglm.big.matrix构建线性模型。但是,因为big.matrix会将所有字符向量转换为因子,并且字符标签将丢失。我决定在 R 之外为我的字符列创建一个查找表,并使用数字来表示 R 的不同级别。但是,我不知道如何告诉big.matrix这些列应该被视为因素而不是数字。请帮忙。

4

1 回答 1

0

我不是很熟悉,read.table.ffdf但你能用它的x论点吗?来自?read.table.ffdf

x
   NULL or an optional ffdf object to which the read records are appended. If this is provided,
   it defines crucial features that are otherwise determnined during the 'first' chunk of
   reading: vmodes, colnames, colClasses, sequence of predefined levels. In order to also read
   the first chunk into such predefined ffdf, an x with 1 row is treated special: instead of
   appending the first row will be overwritten. This is necessary because we cannot provide x 
   with zero rows (we cannot create ff vectors with zero elements).

您可以将相应的列定义x为具有给定水平的因子,然后将其用作模板。

于 2013-05-07T23:12:05.277 回答