是否ifelse
真的计算了yes
和no
向量——就像每个向量的整体一样?或者它只是从每个向量中计算一些值?
还有,ifelse
真的那么慢吗?
是否ifelse
真的计算了yes
和no
向量——就像每个向量的整体一样?或者它只是从每个向量中计算一些值?
还有,ifelse
真的那么慢吗?
ifelse
calculates both its yes
value and its no
value. Except in the case where the test
condition is either all TRUE
or all FALSE
.
We can see this by generating random numbers and observing how many numbers are actually generated. (by reverting the seed
).
# TEST CONDITION, ALL TRUE
set.seed(1)
dump <- ifelse(rep(TRUE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.true <- rnorm(1)
# TEST CONDITION, ALL FALSE
set.seed(1)
dump <- ifelse(rep(FALSE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.false <- rnorm(1)
# TEST CONDITION, MIXED
set.seed(1)
dump <- ifelse(c(FALSE, rep(TRUE, 199)), rnorm(200), rnorm(200))
next.random.number.after.some.TRUE.some.FALSE <- rnorm(1)
# RESET THE SEED, GENERATE SEVERAL RANDOM NUMBERS TO SEARCH FOR A MATCH
set.seed(1)
r.1000 <- rnorm(1000)
cat("Quantity of random numbers generated during the `ifelse` statement when:",
"\n\tAll True ", which(r.1000 == next.random.number.after.all.true) - 1,
"\n\tAll False ", which(r.1000 == next.random.number.after.all.false) - 1,
"\n\tMixed T/F ", which(r.1000 == next.random.number.after.some.TRUE.some.FALSE) - 1
)
Gives the following output:
Quantity of random numbers generated during the `ifelse` statement when:
All True 200
All False 200
Mixed T/F 400 <~~ Notice TWICE AS MANY numbers were
generated when `test` had both
T & F values present
.
.
if (any(test[!nas]))
ans[test & !nas] <- rep(yes, length.out = length(ans))[test & # <~~~~ This line and the one below
!nas]
if (any(!test[!nas]))
ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & # <~~~~ ... are the cluprits
!nas]
.
.
Notice that yes
and no
are computed only if there
is some non-NA
value of test
that is TRUE
or FALSE
(respectively).
At which point -- and this is the imporant part when it comes to efficiency -- the entirety of each vector is computed.
Lets see if we can test it:
library(microbenchmark)
# Create some sample data
N <- 1e4
set.seed(1)
X <- sample(c(seq(100), rep(NA, 100)), N, TRUE)
Y <- ifelse(is.na(X), rnorm(X), NA) # Y has reverse NA/not-NA setup than X
yesifelse <- quote(sort(ifelse(is.na(X), Y+17, X-17 ) ))
noiflese <- quote(sort(c(Y[is.na(X)]+17, X[is.na(Y)]-17)))
identical(eval(yesifelse), eval(noiflese))
# [1] TRUE
microbenchmark(eval(yesifelse), eval(noiflese), times=50L)
N = 1,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 2.286621 2.348590 2.411776 2.537604 10.05973 50
eval(noiflese) 1.088669 1.093864 1.122075 1.149558 61.23110 50
N = 10,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 30.32039 36.19569 38.50461 40.84996 98.77294 50
eval(noiflese) 12.70274 13.58295 14.38579 20.03587 21.68665 50