1

我有一个示例数据框:

       Date      p
4   2001-01-04  6.9
5   2001-01-05  4.5
6   2001-01-06  5.9
8   2001-01-08 15.8
24  2001-01-24  1.3
25  2001-01-25  4.6
26  2001-01-26 13.0
27  2001-01-27 45.1
32  2001-02-01  5.0
36  2001-02-05 21.9
37  2001-02-06 25.4
40  2001-02-09  1.4
41  2001-02-10  1.9
44  2001-02-13  9.1
45  2001-02-14 23.0
46  2001-02-15  8.8
53  2001-02-22  1.1
59  2001-02-28 24.8

我想在连续 3 天及其相关的 p 值(例如日期)时对数据框进行子集:(2001-01-04,2001-01-05,2001-01-06)使用它们的 p 值(6.9,4.5,5.9)。我有巨大的数据框,我在这里写了其中的一部分,我只需要连续 3 天被选择。

对上述问题的任何帮助将不胜感激。

4

2 回答 2

1

这构造了一个 diff() 向量并挑选出长度 >= 2 的运行。然后将该向量移回一个并执行逻辑 OR,因为运行中的第一项将具有 rle()$value== 的 FALSE 值1

dat$Date <- as.Date(dat$Date)
dat$diff <- c(0, diff(dat$Date))
datrl <- rle(dat$diff)  # Inadvertently omitted this line in initial posting
grp <- rep(seq_along(datrl$lengths), datrl$lengths)* 
       rep(datrl$values==1, datrl$lengths)*
       rep(datrl$lengths>=2, datrl$lengths)
dat[ grp | c(grp[-1], 0) , ]

#----
>  dat[ grp | c(grp[-1], 0) , ][1:3.]
         Date    p diff
1  2001-01-04  6.9    0
2  2001-01-05  4.5    1
3  2001-01-06  5.9    1
5  2001-01-24  1.3   16
6  2001-01-25  4.6    1
7  2001-01-26 13.0    1
8  2001-01-27 45.1    1
14 2001-02-13  9.1    3
15 2001-02-14 23.0    1
16 2001-02-15  8.8    1
于 2013-03-01T03:39:58.417 回答
1

假设您想要 3 个连续日期子集的列表。

data <- read.table(textConnection("Date      p\n2001-01-04  6.9\n2001-01-05  4.5\n2001-01-06  5.9\n2001-01-08 15.8\n2001-01-24  1.3\n2001-01-25  4.6\n2001-01-26 13.0\n2001-01-27 45.1\n2001-02-01  5.0\n2001-02-05 21.9\n2001-02-06 25.4\n2001-02-09  1.4\n2001-02-10  1.9\n2001-02-13  9.1\n2001-02-14 23.0\n2001-02-15  8.8\n2001-02-22  1.1\n2001-02-28 24.8"), 
    header = TRUE, colClasses = c("Date", "numeric"))

# find out which dates are 3rd consecutive dates. sel below is logical vector indicating such dates
sel <- c(0, diff(data$Date)) == 1 & c(0, 0, diff(data$Date, 2) == 2)

# get start and end dates
start <- which(sel) - 2
end <- which(sel)

# get all the 3 consecutive dates subsets
mapply(function(start, end) data[start:end, ], start, end, SIMPLIFY = FALSE)
## [[1]]
##         Date   p
## 1 2001-01-04 6.9
## 2 2001-01-05 4.5
## 3 2001-01-06 5.9
## 
## [[2]]
##         Date    p
## 5 2001-01-24  1.3
## 6 2001-01-25  4.6
## 7 2001-01-26 13.0
## 
## [[3]]
##         Date    p
## 6 2001-01-25  4.6
## 7 2001-01-26 13.0
## 8 2001-01-27 45.1
## 
## [[4]]
##          Date    p
## 14 2001-02-13  9.1
## 15 2001-02-14 23.0
## 16 2001-02-15  8.8
## 
于 2013-03-01T03:23:47.847 回答