5

我有一个看起来像这样的向量:

c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)

假设我想将位置 1、6 和 11(非 0)上的值复制到该特定值之后的四个位置,以使向量看起来像这样:

c(0.5,0.5,0.5,0.5,0.5,0.7,0.7,0.7,0.7,0.7,0.4,0.4,0.4,0.4,0.4)

我怎样才能最好地在 R 中做到这一点?

非常感谢!

4

5 回答 5

8

另一种可能:

vec <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)

library(zoo)
vec[vec==0] <- NA
na.locf(vec)
#[1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
于 2013-06-26T13:00:23.307 回答
6

这是一种方法:

zero.locf <- function(x) {
    if (x[1] == 0) stop('x[1] should not be 0')
    with(rle(x), {
        no.0 <- replace(values, values == 0, values[(values == 0) - 1])
        rep(no.0, lengths)
    })
}
x <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
zero.locf(x)
#  [1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4

rle(x)返回list带有项目的 avalueslengths

rle(x)
Run Length Encoding
  lengths: int [1:6] 1 4 1 4 1 4
  values : num [1:6] 0.5 0 0.7 0 0.4 0

with打开它list,让我们直接引用这些条目。

于 2013-06-26T12:49:11.643 回答
4

这是另一种基本的 R 方法。初始零保持原样:

v = c(0,1,2,-2.1,0,3,0,0.4,0,0)
v[v!=0] = diff(c(0, v[v!=0]))
cumsum(v)
# [1]  0.0  1.0  2.0 -2.1 -2.1  3.0  3.0  0.4  0.4  0.4

这里有一些基准:

roland = function(v) {v[v == 0] <- NA; na.locf(v)}
mp = function(x) {with(rle(x), rep(replace(values, values==0, values[which(values == 0)-1]), lengths))}
quant = function(dat) {not.0 <- (dat != 0); approx(which(not.0), dat[not.0], xout = seq(along.with = dat), method = "constant", rule = 2)}
eddi = function(v) {v[v!=0] = diff(c(0, v[v!=0])); cumsum(v)}

v = sample(c(-10:10, 0), 1e6, TRUE)
microbenchmark(roland(v), mp(v), quant(v), eddi(v), times = 10)
#Unit: milliseconds
#      expr      min       lq   median       uq      max neval
# roland(v) 595.1630 625.7692 638.4395 650.4758 664.9224    10
#     mp(v) 410.8224 433.6775 469.9346 496.6328 528.3218    10
#  quant(v) 646.1775 753.0684 759.9805 838.4281 883.3383    10
#   eddi(v) 265.8064 286.2922 316.7022 339.0333 354.0836    10
于 2013-06-26T22:49:36.820 回答
2

我可能会使用 循环遍历每个大于 0 的元素lapply,然后应用rep函数将这些值中的每一个重复 5 次,并通过do.call("c", ...).

do.call("c", lapply(which(tmp > 0), function(i) rep(tmp[i], 5)))
[1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4
于 2013-06-26T12:46:33.133 回答
1

这是使用的替代方法approx

dat   <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
not.0 <- (dat != 0)
approx(which(not.0), dat[not.0], xout = seq(along.with = dat), method = "constant", yleft = 0, rule = 1:2)
# $x
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
#
# $y
# [1] 0.5 0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.4

这里有一个替代方案,它依赖于初始向量的规定结构(重复一个非零值,后跟 4 个零)。它解决了速度问题,但以灵活性为代价。

dat <- c(0.5,0,0,0,0,0.7,0,0,0,0,0.4,0,0,0,0)
rep(dat[seq(1, length(dat), by = 5)], each = 5)
于 2013-06-26T17:10:05.623 回答