I can't think of an obvious simple solution, but, having looked at the suggestions (particularly smci's suggestion of using rle
) I came up with a complicated function that appears to be more efficient.
This is the code, I'll explain below:
# Your function
your.func = function(dat) {
na.pos <- which(is.na(dat))
if (length(na.pos) == length(dat)) {
return(dat)
}
non.na.pos <- setdiff(seq_along(dat), na.pos)
nearest.non.na.pos <- sapply(na.pos, function(x) which.min(abs(non.na.pos - x)))
dat[na.pos] <- dat[non.na.pos[nearest.non.na.pos]]
dat
}
# My function
my.func = function(dat) {
nas=is.na(dat)
if (!any(!nas)) return (dat)
t=rle(nas)
f=sapply(t$lengths[t$values],seq)
a=unlist(f)
b=unlist(lapply(f,rev))
x=which(nas)
l=length(dat)
dat[nas]=ifelse(a>b,dat[ ifelse((x+b)>l,x-a,x+b) ],dat[ifelse((x-a)<1,x+b,x-a)])
dat
}
# Test
n = 100000
test.vec = 1:n
set.seed(1)
test.vec[sample(test.vec,n/4)]=NA
system.time(t1<-my.func(test.vec))
system.time(t2<-your.func(test.vec)) # 10 times speed improvement on my machine
# Verify
any(t1!=t2)
My function relies on rle
. I am reading the comments above but it looks to me like rle
works just fine for NA
. It is easiest to explain with a small example.
If I start with a vector:
dat=c(1,2,3,4,NA,NA,NA,8,NA,10,11,12,NA,NA,NA,NA,NA,18)
I then get the positions of all the NAs:
x=c(5,6,7,8,13,14,15,16,17)
Then, for every "run" of NAs I create a sequence from 1 to the length of the run:
a=c(1,2,3,1,1,2,3,4,5)
Then I do it again, but I reverse the sequence:
b=c(3,2,1,1,5,4,3,2,1)
Now, I can just compare vectors a and b: If a<=b then look back and grab the value at x-a. If a>b then look ahead and grab the value at x+b. The rest is just handling the corner cases when you have all NAs or NA runs at the end or the start of the vector.
There is probably a better, simpler, solution, but I hope this gets you started.