I'm struggling to get the following done:
Example dataset:
belongID uniqID Time Rating
1 101 5 0
1 102 4 0
2 103 4 0
2 104 3 0
2 105 2 5
3 106 4 2
3 107 5 0
3 108 5 1
The problem is: I would like to extract the most recent entry (largest value for time) per belongID, unless this rating is 0. If the rating of the most recent entry is 0 however. I want the first entry with a rating (not the highest rating, just the first value with a rating that is not zero). If all other entries are also zero, the most recent one needs to be selected.
The end result should than be:
belongID uniqID Time Rating
1 101 5 0
2 105 2 5
3 108 5 1
The dataset is pretty large and is ordered by belongID. It is not ordered by time, so more recent entries may come after older entries with the same belongID.
Without having the "0 Rating" constraint, I used the following function to calculate the most recent entry:
>uniqueMax <- function(m, belongID = 1, time = 3) {
t(
vapply(
split(1:nrow(m), m[,belongID]),
function(i, x, time) x[i, , drop=FALSE][which.max(x[i,time]),], m[1,], x=m, time=time
)
)
}
I do not know how to incorporate the "0 Rating" constraint.
EDIT: A follow up question:
Does anyone know how the getRating
function should be altered if not only rating zero, but more ratings need to be taken into account (for instance 0,1,4 and 5)? Thus assign to most recent, unless Rating 0 or 1 or 4 or 5? If Rating is 0,1,4,5 assign to most recent entry with a different rating. If all ratings are 0,1,4 or 5 assign to the most recent of those. I tried the following, but that did not work:
getRating <- function(x){
iszero <- x$Rating == 0 | x$Rating == 1 | x$Rating == 4 | x$Rating ==5
if(all(iszero)){
id <- which.max(x$Time)
} else {
id <- which.max((!iszero)*x$Time)
# This trick guarantees taking 0 into account
}
x[id,]
}
# Do this over the complete data frame
do.call(rbind,lapply(split(Data,Data$belongID),getRating))
# edited per Tyler's suggestion'