第一关,抱歉这里没有任何可重现的数据,但我不知道如何重现这个问题。但我会尽我最大的努力列出我所做的事情的逐步清单以及任何相关信息。任何有关故障排除的想法将不胜感激。
我的问题是这样的:
我有一个大型时间序列数据集,我将其读入 R。我最终转换为 zoo,但现在我将其保留为数据框。使用read.csv
我将数据读入 R。使用str
查看我得到的数据:
> str(Met)
'data.frame': 568354 obs. of 18 variables:
$ time_local : Factor w/ 568354 levels "2006-08-06 03:15:00",..: 1 2 3 4 5 6 7 8 9 10 ...
注意- Met$time_local 是我关心的,我已经删除了 str 读数的所有其他列。
如果我使用搜索重复项
Dup<-Met$time_local[duplicated(Met$time_local)]
我什么都得不到
str(Dup)
Factor w/ 568354 levels "2006-08-06 03:15:00",..:
如果我使用 strptime 将日期/时间数据转换为 POSIXlt 或 POSIXct 对象
MetStrp<-strptime(Met$time_local, "%Y-%m-%d %H:%M:%S")
str(MetStrp)
POSIXlt[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...
然后搜索重复项
Dup<-MetStrp[duplicated(MetStrp)]
> head(Dup)
[1] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[4] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
> str(Dup)
POSIXlt[1:60], format: "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00" ...
我现在有 60 个重复项(稍后在我创建动物园对象时会抛出一些东西)。
有趣的是,如果我将 POSIXlt 格式更改为 POSIXct
ct<-as.POSIXct(MetStrp)
str(ct)
POSIXct[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...
我得到相同的重复,但偏移了一个小时
Dup<-ct[duplicated(ct)]
> head(Dup)
[1] "2007-03-11 01:00:00 PST" "2007-03-11 01:05:00 PST" "2007-03-11 01:10:00 PST"
[4] "2007-03-11 01:15:00 PST" "2007-03-11 01:20:00 PST" "2007-03-11 01:25:00 PST"
> str(Dup)
POSIXct[1:60], format: "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00" ...
如果我选择使用查找重复位置
Dup_loc<-which(duplicated(MetStrp) | duplicated(MetStrp,fromLast=TRUE))
我得到 120 个重复的位置。最终成为 POSIXlt 和 POSIXct 重复项的组合。
str(Dup_loc)
int [1:120] 62470 62471 62472 62473 62474 62475 62476 62477 62478 62479 ...
POSIXct 日期始终为 1-2 小时,POSIClt 日期始终为 2-3 小时
要查看重复项:
Test<-MetStrp[Dup_loc]
>Test
[1] "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00"
[4] "2007-03-11 01:15:00" "2007-03-11 01:20:00" "2007-03-11 01:25:00"
[7] "2007-03-11 01:30:00" "2007-03-11 01:35:00" "2007-03-11 01:40:00"
[10] "2007-03-11 01:45:00" "2007-03-11 01:50:00" "2007-03-11 01:55:00"
[13] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[16] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
[19] "2007-03-11 02:30:00" "2007-03-11 02:35:00" "2007-03-11 02:40:00"
[22] "2007-03-11 02:45:00" "2007-03-11 02:50:00" "2007-03-11 02:55:00"
[25] "2008-03-09 01:00:00" "2008-03-09 01:05:00" "2008-03-09 01:10:00"
[28] "2008-03-09 01:15:00" "2008-03-09 01:20:00" "2008-03-09 01:25:00"
[31] "2008-03-09 01:30:00" "2008-03-09 01:35:00" "2008-03-09 01:40:00"
[34] "2008-03-09 01:45:00" "2008-03-09 01:50:00" "2008-03-09 01:55:00"
[37] "2008-03-09 02:00:00" "2008-03-09 02:05:00" "2008-03-09 02:10:00"
[40] "2008-03-09 02:15:00" "2008-03-09 02:20:00" "2008-03-09 02:25:00"
[43] "2008-03-09 02:30:00" "2008-03-09 02:35:00" "2008-03-09 02:40:00"
[46] "2008-03-09 02:45:00" "2008-03-09 02:50:00" "2008-03-09 02:55:00"
[49] "2009-03-08 01:00:00" "2009-03-08 01:05:00" "2009-03-08 01:10:00"
[52] "2009-03-08 01:15:00" "2009-03-08 01:20:00" "2009-03-08 01:25:00"
[55] "2009-03-08 01:30:00" "2009-03-08 01:35:00" "2009-03-08 01:40:00"
[58] "2009-03-08 01:45:00" "2009-03-08 01:50:00" "2009-03-08 01:55:00"
[61] "2009-03-08 02:00:00" "2009-03-08 02:05:00" "2009-03-08 02:10:00"
[64] "2009-03-08 02:15:00" "2009-03-08 02:20:00" "2009-03-08 02:25:00"
[67] "2009-03-08 02:30:00" "2009-03-08 02:35:00" "2009-03-08 02:40:00"
[70] "2009-03-08 02:45:00" "2009-03-08 02:50:00" "2009-03-08 02:55:00"
[73] "2010-03-14 01:00:00" "2010-03-14 01:05:00" "2010-03-14 01:10:00"
[76] "2010-03-14 01:15:00" "2010-03-14 01:20:00" "2010-03-14 01:25:00"
[79] "2010-03-14 01:30:00" "2010-03-14 01:35:00" "2010-03-14 01:40:00"
[82] "2010-03-14 01:45:00" "2010-03-14 01:50:00" "2010-03-14 01:55:00"
[85] "2010-03-14 02:00:00" "2010-03-14 02:05:00" "2010-03-14 02:10:00"
[88] "2010-03-14 02:15:00" "2010-03-14 02:20:00" "2010-03-14 02:25:00"
[91] "2010-03-14 02:30:00" "2010-03-14 02:35:00" "2010-03-14 02:40:00"
[94] "2010-03-14 02:45:00" "2010-03-14 02:50:00" "2010-03-14 02:55:00"
[97] "2011-03-13 01:00:00" "2011-03-13 01:05:00" "2011-03-13 01:10:00"
[100] "2011-03-13 01:15:00" "2011-03-13 01:20:00" "2011-03-13 01:25:00"
[103] "2011-03-13 01:30:00" "2011-03-13 01:35:00" "2011-03-13 01:40:00"
[106] "2011-03-13 01:45:00" "2011-03-13 01:50:00" "2011-03-13 01:55:00"
[109] "2011-03-13 02:00:00" "2011-03-13 02:05:00" "2011-03-13 02:10:00"
[112] "2011-03-13 02:15:00" "2011-03-13 02:20:00" "2011-03-13 02:25:00"
[115] "2011-03-13 02:30:00" "2011-03-13 02:35:00" "2011-03-13 02:40:00"
[118] "2011-03-13 02:45:00" "2011-03-13 02:50:00" "2011-03-13 02:55:00"
据我所知,我没有在上面看到任何重复的时间戳。所以我不确定是怎么回事,但有些不对劲。
据我所知,我所做的只是将因子数据集转换为基于时间的数据集。所以我不知道为什么我在动物园里得到一个重复的错误,并duplicated
在没有出现重复的时候找到重复的错误。
再次,对此事的任何想法将不胜感激。