1

I have updated the question, as a) i articulated the question not clearly on the first attempt, b) my exact need also shifted somewhat.

I want to especially thank Hemmo for great help so far - and apologies for not articulating my question clearly enough to him. His code (that addressed earlier version of problem) is shown in the answer section.

At a high-level - i am looking for code that helps to identify and differentiate the different blocks of consecutive free time of different individuals. More specifically - the code would ideally:

  • Check whehter an activity is labelled as "Free"
  • Check whether consecutive weeks (week earlier, week later) of time spent by the same person where also labelled as "Free".
  • Give the entire block of consecutive weeks of that person that are labelled "Free" an indicator in the desired outcome column. Note that the lenght of time-periods (e.g. 1 consec week, 4 consec weeks, 8 consec weeks) will vary
  • Finally - due to a need for further analysis on the characteristics of these clusters, different blocks should receive different indicators. (e.g. the march block of Paul would have value 1, the May block value 2, and Kim's block in March would be have value 3)

Hopefully this becomes more clear when one looks at the example dataframe (see the desired final column)

Any help much appreciated, code for the test dataframe per below.

Many thanks in advance,

W

Example (note that the last column should be generated by the code, purely included as illustration):

         Week Name  Activity Hours Desired_Outcome
1  01/01/2013 Paul      Free    40               1
2  08/01/2013 Paul      Free    10               1
3  08/01/2013 Paul Project A    30               0
4  15/01/2013 Paul Project B    30               0
5  15/01/2013 Paul Project A    10               0
6  22/01/2013 Paul      Free    40               2
7  29/01/2013 Paul Project B    40               0
8  05/02/2013 Paul      Free    40               3
9  12/02/2013 Paul      Free    10               3
10 19/02/2013 Paul      Free    30               3
11 01/01/2013  Kim Project E    40               0
12 08/01/2013  Kim      Free    40               4
13 15/01/2013  Kim      Free    40               4
14 22/01/2013  Kim Project E    40               0
15 29/01/2013  Kim      Free    40               5

Code for dataframe:

Name=c(rep("Paul",10),rep("Kim",5))
Week=c("01/01/2013","08/01/2013","08/01/2013","15/01/2013","15/01/2013","22/01/2013","29/01/2013","05/02/2013","12/02/2013","19/02/2013","01/01/2013","08/01/2013","15/01/2013","22/01/2013","29/01/2013")
Activity=c("Free","Free","Project A","Project B","Project A","Free","Project B","Free","Free","Free","Project E","Free","Free","Project E","Free")
Hours=c(40,10,30,30,10,40,40,40,10,30,40,40,40,40,40)
Desired_Outcome=c(1,1,0,0,0,2,0,3,3,3,0,4,4,0,5)
df=as.data.frame(cbind(Week,Name,Activity,Hours,Desired_Outcome))        
df
4

2 回答 2

2

EDIT: This was messy already as the question was edited several times, so I removed old answers.

checkFree<-function(df){
  df$Week<-as.Date(df$Week,format="%d/%m/%Y")
  df$outcome<-numeric(nrow(df))

  if(df$Activity[1]=="Free"){ #check first
    counter<-1
    df$outcome[1]<-counter    
  } else counter<-0
  for(i in 2:nrow(df)){
    if(df$Activity[i]=="Free"){
       LastWeek <- (df$Week >= (df$Week[i]-7) & 
                        df$Week < (df$Week[i]))  
      if(all(df$Activity[LastWeek]!="Free"))
        counter<-counter+1 
      df$outcome[i]<-counter
    }
  }
  df
}

splitdf<-split(df, Name)

df<-unsplit(lapply(splitdf,checkFree),Name)

uniqs<-unique(df2$Name) #for renumbering
for(i in 2:length(uniqs))
  df$outcome[df$Name==uniqs[i] & df$outcome>0]<-
  max(df$outcome[df$Name==uniqs[i-1]]) +
  df$outcome[df$Name==uniqs[i] & df$outcome>0]
  df

That should do it, although the above code is probably far from optimal.

于 2013-03-09T12:17:14.923 回答
1

Using the comment by user1885116 to Hemmo's answer as a guide to what is desired, here is a somewhat simpler approach:

N <- 1
x <- with(df, df[Activity=='Free',])
y <- with(x, diff(Week)) <= N*7

df$outcome <- 0
df[rownames(x[c(y, FALSE) | c(FALSE, y),]),]$outcome <- 1

df

##         Week  Activity Hours Desired_Outcome outcome
## 1 2013-01-01 Project A    40               0       0
## 2 2013-01-08 Project A    10               0       0
## 3 2013-01-08      Free    30               1       1
## 4 2013-01-15 Project B    30               0       0
## 5 2013-01-15      Free    10               1       1
## 6 2013-01-22 Project B    40               0       0
## 7 2013-01-29      Free    40               0       0
## 8 2013-02-05 Project C    40               0       0
于 2013-03-09T17:25:21.377 回答