0

I have a vector of dates:

mydates <- seq(as.Date("2013-01-01"), length=6, by="1 month")

and a data frame with some more data such as (but with a lot more entries):

startdate <- as.Date(c("2013-01-01", "2013-02-01", "2013-05-15", "2013-05-22"))
enddate <- as.Date(c("2013-02-21", "2013-03-15", "2013-06-15", "2013-07-22"))
state <- c("NY", "NY", "CA", "CA")
df <- data.frame(startdate=startdate, enddate=enddate, state=state)

Now I'd like to use each date in the mydates vector to check how many entries existed in each state. That is, I'd like to be able to do these statements

result <- subset(df, startdate <= mydates[1] & enddate > mydates[1])
table(result$state)

for each element of the mydates vector. I tried various apply functions and the foreach package but nothing's working. Thanks for any suggestions.

Update Per advice below, some of the many things I tried that didn't work:

 results <- for(i in 1:length(mydates)) {subset(df, startdate <= mydates[i] & enddate > mydates[i])} 

foreach(i=mydates) %do% { subset(df, startdate<= i & enddate > i) } 

and creating a separate function

myf <- function (mydate,mydf=df) {
x <- subset(mydf, startdate <= mydate & enddate > mydate)

}

with the subsetting and trying sapply(mydates, myf)

This myresults <- sapply(mydates, myf)

gives me the same results as

all_results <- sapply(1:length(mydates), function(x) subset(df, startdate <= mydates[x] & enddate > mydates[x]))

below which are

          [,1]     [,2]      [,3]     [,4]      [,5]      [,6]     
startdate 15706    Numeric,2 15737    Numeric,0 Numeric,0 Numeric,2
enddate   15757    Numeric,2 15779    Numeric,0 Numeric,0 Numeric,2
state     factor,1 factor,2  factor,1 factor,0  factor,0  factor,2

I could be misunderstanding those, but it doesn't appear to show me the number of matching results by state.

4

2 回答 2

1

...以及其他可能的解决方案

sapply(mydates, function(x, df){
        ind<-df[ ,"startdate"] <= x & df[,"enddate"] > x
        table(df[ind, "state"])}, df=df)

hth

于 2013-08-06T13:40:28.947 回答
0

我想你正在寻找

all_results <- sapply(1:length(mydates), function(x) subset(df, startdate <= mydates[x] & enddate > mydates[x])

但发布您尝试过的代码会很有帮助(根据标准 SO 规则),以便我们指出可能的错误。

于 2013-08-06T13:18:37.103 回答