5

我需要离开这个

 id  |    date
-----------------
  A  | 2000-01-13
  A  | 2000-01-18
  A  | 2000-01-25
  B  | 2012-10-10
  B  | 2012-10-11
  C  | 2005-07-25
  C  | 2005-07-31

对此

 id  |    date     | days from start
---------------------------
  A  | 2000-01-13  |  0
  A  | 2000-01-18  |  5
  A  | 2000-01-25  |  12
  A  | 2000-02-08  |  26
  B  | 2012-10-10  |  0
  B  | 2012-10-11  |  1
  C  | 2005-07-25  |  0
  C  | 2005-07-31  |  6

即创建一个变量,该变量保存自第一个日期以来经过的天数,按 id 分组。

有任何想法吗?

4

3 回答 3

10

使用data.table:(我假设date这里的列是字符。如果它的date格式,那么你可以删除as.Date(.)函数调用。

df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), 
             date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10", 
                    "2012-10-11", "2005-07-25", "2005-07-31")), 
             .Names = c("id", "date"), row.names = c(NA, -7L), 
             class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]

#    id       date days_from_start
# 1:  A 2000-01-13               0
# 2:  A 2000-01-18               5
# 3:  A 2000-01-25              12
# 4:  B 2012-10-10               0
# 5:  B 2012-10-11               1
# 6:  C 2005-07-25               0
# 7:  C 2005-07-31               6
于 2013-01-21T10:30:34.487 回答
5

您还可以使用函数difftime和的组合split

dat
  id       date
1  A 2000-01-13
2  A 2000-01-18
3  A 2000-01-25
4  B 2012-10-10
5  B 2012-10-11
6  C 2005-07-25
7  C 2005-07-31

dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
                         function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
  id       date Days spent
1  A 2000-01-13          0
2  A 2000-01-18          5
3  A 2000-01-25         12
4  B 2012-10-10          0
5  B 2012-10-11          1
6  C 2005-07-25          0
7  C 2005-07-31          6

按照@agstudy 和@Arun 的建议,这可以简化如下:

dat$"Days spent" <- unlist(by(dat, dat$id, 
                           function(x)difftime(x$date,x$date[1], units= "days")))
于 2013-01-21T10:33:14.370 回答
0

其他两种方法:ave并使用plyr库:

df <-
structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), date = structure(c(10969, 
10974, 10981, 15623, 15624, 12989, 12995), class = "Date")), .Names = c("id", 
"date"), row.names = c(NA, -7L), class = "data.frame")

使用ave,日期必须更改为数字

df$days_from_start <- ave(as.numeric(df$date), df$id, FUN = function(x) x-min(x))

这使

> df
  id       date days_from_start
1  A 2000-01-13               0
2  A 2000-01-18               5
3  A 2000-01-25              12
4  B 2012-10-10               0
5  B 2012-10-11               1
6  C 2005-07-25               0
7  C 2005-07-31               6
> str(df)
'data.frame':   7 obs. of  3 variables:
 $ id             : chr  "A" "A" "A" "B" ...
 $ date           : Date, format: "2000-01-13" ...
 $ days_from_start: num  0 5 12 0 1 0 6

使用plyr库:

library("plyr")
df <- ddply(df, .(id), mutate, days_from_start = date - min(date))

这使

> df
  id       date days_from_start
1  A 2000-01-13          0 days
2  A 2000-01-18          5 days
3  A 2000-01-25         12 days
4  B 2012-10-10          0 days
5  B 2012-10-11          1 days
6  C 2005-07-25          0 days
7  C 2005-07-31          6 days
> str(df)
'data.frame':   7 obs. of  3 variables:
 $ id             : chr  "A" "A" "A" "B" ...
 $ date           : Date, format: "2000-01-13" ...
 $ days_from_start:Class 'difftime'  atomic [1:7] 0 5 12 0 1 0 6
  .. ..- attr(*, "units")= chr "days"
于 2013-01-22T23:03:17.987 回答