2

我有这个 csv 文件(fm.file):

Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246

等等。

我运行这个命令:

fm.data <- as.xts(read.zoo(file=fm.file,format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)

我得到以下信息:

[1] TRUE

如何在不丢失日期索引的情况下将 fm.data 设为数字。我想执行一些需要数据为数字的统计操作。

4

3 回答 3

2

I was puzzled by two things: It didn't seem that that 'read.zoo' should give you a character matrix, and it didn't seem that changing it's class would affect the index values, since the data type should be separate from the indices. So then I tried to replicate the problem and get a different result:

txt <- "Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246"
require(xts)
fm.data <- as.xts(read.zoo(file=textConnection(txt),format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
#[1] FALSE

 str(fm.data)
#-------------
An ‘xts’ object from 2011-02-28 to 2011-03-04 containing:
  Data: num [1:5, 1:2] 14.6 14.6 14.6 14.6 14.6 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "FM1" "FM2"
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
List of 2
 $ tclass: chr [1:2] "POSIXct" "POSIXt"
 $ tzone : chr ""

zoo- and xts-objects have their data in a matrix accessed with coredata and their indices are a separate set of attributes.

于 2012-09-27T07:23:12.660 回答
1

我认为问题是你的 csv 文件中有一些脏数据。换句话说,FM1 或 FM2 列在某处包含一个字符,阻止它被解释为数字列。发生这种情况时,XTS(它是下面的矩阵)将强制整个内容为字符类型。

这是使用 R 查找可疑数据的一种方法:

s <- scan(fm.file,what="character")
# s is now a vector of character strings, one entry per line
s <- s[-1]  #Chop off the header row
all(grepl('^[-0-9,.]*$',s,perl=T)) #True means all your data is clean
s[ !grepl('^[-0-9,.]*$',s,perl=T) ]
which( !grepl('^[-0-9,.]*$',s,perl=T) ) + 1

倒数第二行打印出所有包含您未预料到的字符的 csv 行。最后一行告诉您它们是文件中的哪些行(+1,因为我们删除了标题行)。

于 2012-09-28T01:11:40.907 回答
0

为什么不简单地使用read.csv,然后使用将第一列转换为Date对象as.Date

> x <- read.csv(fm.file, header=T)
> x$Date <- as.Date(x$Date, format="%d/%m/%Y")
> x
        Date      FM1      FM2
1 2011-02-28 14.57161 11.46946
2 2011-03-01 14.57220 11.45751
3 2011-03-02 14.57480 11.48718
4 2011-03-03 14.57556 11.48780
5 2011-03-04 14.57686 11.49025
于 2012-09-27T12:45:43.370 回答