我有一个看起来像这样的向量:
c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
假设我想将位置 1、6 和 11(非 0)上的值复制到该特定值之后的四个位置,以使向量看起来像这样:
c(0.5,0.5,0.5,0.5,0.5,0.7,0.7,0.7,0.7,0.7,0.4,0.4,0.4,0.4,0.4)
我怎样才能最好地在 R 中做到这一点?
非常感谢!
另一种可能:
vec <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
library(zoo)
vec[vec==0] <- NA
na.locf(vec)
#[1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
这是一种方法:
zero.locf <- function(x) {
if (x[1] == 0) stop('x[1] should not be 0')
with(rle(x), {
no.0 <- replace(values, values == 0, values[(values == 0) - 1])
rep(no.0, lengths)
})
}
x <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
zero.locf(x)
# [1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
rle(x)
返回list
带有项目的 avalues
和lengths
。
rle(x)
Run Length Encoding
lengths: int [1:6] 1 4 1 4 1 4
values : num [1:6] 0.5 0 0.7 0 0.4 0
with
打开它list
,让我们直接引用这些条目。
这是另一种基本的 R 方法。初始零保持原样:
v = c(0,1,2,-2.1,0,3,0,0.4,0,0)
v[v!=0] = diff(c(0, v[v!=0]))
cumsum(v)
# [1] 0.0 1.0 2.0 -2.1 -2.1 3.0 3.0 0.4 0.4 0.4
这里有一些基准:
roland = function(v) {v[v == 0] <- NA; na.locf(v)}
mp = function(x) {with(rle(x), rep(replace(values, values==0, values[which(values == 0)-1]), lengths))}
quant = function(dat) {not.0 <- (dat != 0); approx(which(not.0), dat[not.0], xout = seq(along.with = dat), method = "constant", rule = 2)}
eddi = function(v) {v[v!=0] = diff(c(0, v[v!=0])); cumsum(v)}
v = sample(c(-10:10, 0), 1e6, TRUE)
microbenchmark(roland(v), mp(v), quant(v), eddi(v), times = 10)
#Unit: milliseconds
# expr min lq median uq max neval
# roland(v) 595.1630 625.7692 638.4395 650.4758 664.9224 10
# mp(v) 410.8224 433.6775 469.9346 496.6328 528.3218 10
# quant(v) 646.1775 753.0684 759.9805 838.4281 883.3383 10
# eddi(v) 265.8064 286.2922 316.7022 339.0333 354.0836 10
我可能会使用 循环遍历每个大于 0 的元素lapply
,然后应用rep
函数将这些值中的每一个重复 5 次,并通过do.call("c", ...)
.
do.call("c", lapply(which(tmp > 0), function(i) rep(tmp[i], 5)))
[1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
这是使用的替代方法approx
dat <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
not.0 <- (dat != 0)
approx(which(not.0), dat[not.0], xout = seq(along.with = dat), method = "constant", yleft = 0, rule = 1:2)
# $x
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#
# $y
# [1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
这里有一个替代方案,它依赖于初始向量的规定结构(重复一个非零值,后跟 4 个零)。它解决了速度问题,但以灵活性为代价。
dat <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
rep(dat[seq(1, length(dat), by = 5)], each = 5)