后期编辑:在编辑和扩展评论后重新阅读后,我想知道是否需要(或至少要求)与我在下面的建议完全相反。对此的要求:
不幸的是,read.csv 正在将所有空白和 NA 转换为“NA”。我想将 NA 和 NaN 读取为字符。
,,, 可能已经满足(有点自相矛盾)的参数:
colClasses="character", stringsAsFactors=FALSE, na.strings="."`
然后,包括空字符串在内的任何字符值都将作为自身出现。反对这一点的是接受将空字符值(“”)转换为 R_NA_character
值的答案。
这是一个具有各种结果的测试示例:
sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings=""), class )
# A B C D
# "factor" "logical" "factor" "numeric"
sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings="x"), class )
# A B C D
# "factor" "logical" "factor" "numeric"
sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings="x", stringsAsFactors=FALSE), class )
# A B C D
#"character" "logical" "character" "numeric"
#Almost the expressed desired result
sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', #colClasses="character", stringsAsFactors=FALSE), class )
# A B C D
#"character" "character" "character" "character"
#But ... still get a real R <NA>
read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', colClasses="character", stringsAsFactors=FALSE)
# A B C D
#1 a <NA> NaN
#So add all three
read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', colClasses="character", stringsAsFactors=FALSE,na.strings=".")
# A B C D
#1 a NA NaN
# Finally all columns are character and no "real" R NA's
na.strings 的默认值只是“NA”,因此您可能需要添加“NaN”。真正的空格 ("") 设置为缺失,但空格 (" ") 不是:
b<- read.csv("a.txt", skip =0,
comment.char = "",check.names = FALSE, quote="",
na.strings=c("NA","NaN", " ") )
目前尚不清楚这是问题所在,因为您的数据示例格式错误且没有逗号。这可能是根本问题,因为 read.csv 不允许制表符分隔。使用read.delim
orread.table
如果您的数据有制表符分隔。
b<- read.table("a.txt", sep="\t" skip =0, header = TRUE,
comment.char = "",check.names = FALSE, quote="",
na.strings=c("NA","NaN", " ") )
# worked example for csv text file connection
bt <- "A,B,C
10,20,NaN
30,,40
40,30,20
,NA,20"
b<- read.csv(text=bt, sep=",",
comment.char = "",check.names = FALSE, quote="\"",
na.strings=c("NA","NaN", " ") )
b
#--------------
A B C
1 10 20 NA
2 30 NA 40
3 40 30 20
4 NA NA 20
示例 2:
bt <- "A,B,C,D
10,20,NaN
30,,40
40,30,20
,NA,20"
b<- read.csv(text=bt, sep=",",
comment.char = "",check.names = FALSE, quote="\"",
na.strings=c("NA","NaN", " ") , colClasses=c(rep("numeric", 3), "logical"))
b
#----------------
A B C D
1 10 20 NA NA
2 30 NA 40 NA
3 40 30 20 NA
4 NA NA 20 NA
> str(b)
'data.frame': 4 obs. of 4 variables:
$ A: num 10 30 40 NA
$ B: num 20 NA 30 NA
$ C: num NA 40 20 20
$ D: logi NA NA NA NA
有点有趣的是,数值向量的 NA 和 NaN 并不相同。NaN 由没有数学意义的操作返回(但正如您在帮助页面中所指出的?NaN
,操作的结果可能取决于特定的操作系统。相等性测试不适用于 NaN 或 NA。有特定is
的函数他们:
> Inf*0
[1] NaN
> is.nan(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE TRUE FALSE
> is.na(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE TRUE TRUE # note the difference