Assume I have a preallocated data structure that I want to write into for the sake of performance vs. growing the data structure over time. First I tried this using sapply:
set.seed(1)
count <- 5
pre <- numeric(count)
sapply(1:count, function(i) {
pre[i] <- rnorm(1)
})
pre
# [1] 0 0 0 0 0
for(i in 1:count) {
pre[i] <- rnorm(1)
}
pre
# [1] -0.8204684 0.4874291 0.7383247 0.5757814 -0.3053884
I assume this is because the anonymous function in sapply
is in a different scope (or is it environment in R?) and as a result the pre
object isn't the same. The for loop exists in the same scope/environment and so it works as expected.
I've generally tried to adopt the R mechanisms for iteration with apply functions vs. for, but I don't see a way around it here. Is there something different I should be doing or a better idiom for this type of operation?
As noted, my example is highly contrived, I have no interested in generaring normal deviates. Instead my actual code is dealing with a 4 column 1.5 million row dataframe. Previously I was relying on growing and merging to get a final dataframe and decided to try to avoid merges and preallocate based on benchmarking.