r - R 从字符到数字

Question

我有这个 csv 文件（fm.file）：

Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246

等等。

我运行这个命令：

fm.data <- as.xts(read.zoo(file=fm.file,format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)

我得到以下信息：

[1] TRUE

如何在不丢失日期索引的情况下将 fm.data 设为数字。我想执行一些需要数据为数字的统计操作。

score 2 · Accepted Answer

I was puzzled by two things: It didn't seem that that 'read.zoo' should give you a character matrix, and it didn't seem that changing it's class would affect the index values, since the data type should be separate from the indices. So then I tried to replicate the problem and get a different result:

txt <- "Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246"
require(xts)
fm.data <- as.xts(read.zoo(file=textConnection(txt),format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
#[1] FALSE

 str(fm.data)
#-------------
An ‘xts’ object from 2011-02-28 to 2011-03-04 containing:
  Data: num [1:5, 1:2] 14.6 14.6 14.6 14.6 14.6 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "FM1" "FM2"
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
List of 2
 $ tclass: chr [1:2] "POSIXct" "POSIXt"
 $ tzone : chr ""

zoo- and xts-objects have their data in a matrix accessed with coredata and their indices are a separate set of attributes.

score 1 · Accepted Answer

我认为问题是你的 csv 文件中有一些脏数据。换句话说，FM1 或 FM2 列在某处包含一个字符，阻止它被解释为数字列。发生这种情况时，XTS（它是下面的矩阵）将强制整个内容为字符类型。

这是使用 R 查找可疑数据的一种方法：

s <- scan(fm.file,what="character")
# s is now a vector of character strings, one entry per line
s <- s[-1]  #Chop off the header row
all(grepl('^[-0-9,.]*$',s,perl=T)) #True means all your data is clean
s[ !grepl('^[-0-9,.]*$',s,perl=T) ]
which( !grepl('^[-0-9,.]*$',s,perl=T) ) + 1

倒数第二行打印出所有包含您未预料到的字符的 csv 行。最后一行告诉您它们是文件中的哪些行（+1，因为我们删除了标题行）。

score 0 · Accepted Answer

为什么不简单地使用read.csv，然后使用将第一列转换为Date对象as.Date

> x <- read.csv(fm.file, header=T)
> x$Date <- as.Date(x$Date, format="%d/%m/%Y")
> x
        Date      FM1      FM2
1 2011-02-28 14.57161 11.46946
2 2011-03-01 14.57220 11.45751
3 2011-03-02 14.57480 11.48718
4 2011-03-03 14.57556 11.48780
5 2011-03-04 14.57686 11.49025

r - R 从字符到数字

3 回答 3

Related

Reference