我正在尝试将 CSV 文件中的数据读取到数据框中。数据包含我不想作为因素的名称。我不能使用该stringAsFactors=FALSE
参数,因为我想将其他列作为因素。
如何实现所需的行为?
注意:数据有数千列……我只需要修改一列的数据类型……其余的默认分配的类型都可以
使用colClasses
参数指定每列的类型。例如:
x <- read.csv("myfile.csv", colClasses=c("numeric","factor","character"))
您可以指定列类。从 ?read.table
colClasses: character. A vector of classes to be assumed for the
columns. Recycled as necessary, or if the character vector
is named, unspecified values are taken to be 'NA'.
Possible values are 'NA' (the default, when 'type.convert' is
used), '"NULL"' (when the column is skipped), one of the
atomic vector classes (logical, integer, numeric, complex,
character, raw), or '"factor"', '"Date"' or '"POSIXct"'.
Otherwise there needs to be an 'as' method (from package
'methods') for conversion from '"character"' to the specified
formal class.
Note that 'colClasses' is specified per column (not per
variable) and so includes the column of row names (if any).
所以像:
types = c("numeric", "character", "factor")
read.table("file.txt", colClasses = types)
应该做的伎俩。
就个人而言,我只会将列作为字符串或因子读取,然后更改您想要的列。
正如先前答案中的文档所述,如果您在读取数据之前知道列的名称,则可以使用命名字符向量仅指定该列。
types <- c(b="character") #Set the column named "b" to character
df <- read.table(header=TRUE,sep=",",colClasses=types,text="
a,b,c,d,e
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
# a b c d e
# "integer" "character" "factor" "integer" "factor"
如果没有表头,也可以按位置来做:
types <- c(V2="character") #Set the second column to character
df <- read.table(header=FALSE,sep=",",colClasses=types,text="
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
# V1 V2 V3 V4 V5
#"integer" "character" "factor" "integer" "factor"
最后,如果您知道位置但有标题,则可以构建适当长度的向量。对于colClasses
,NA
表示默认值。
types <- rep.int(NA_character_,5) #make this length the number of columns
types[2] <- "character" #force the second column as character
df <- read.table(header=TRUE,sep=",",colClasses=types,text="
a,b,c,d,e
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
# V1 V2 V3 V4 V5
#"integer" "character" "factor" "integer" "factor"