r - 将一行中的值添加到R中的前一个值

Question

我想在 R 中执行一个在 excel 中很容易完成的简单操作：

我有一个由 5045 个名为 K 的条目组成的 col。我想创建第二个 col L，其中第一个值为 L1=100+K[1]，第二个值为 L2=L1+K[2]，第三个值为 L3= L2+K[3] 以此类推。

有没有一种简单的方法可以在 R 中做到这一点？在 Excel 中，只需拉下 col。

score 8 · Accepted Answer

8

尝试类似的东西

L <- 100 + cumsum(K)

于 2012-10-24T10:49:59.097 回答

score 4 · Accepted Answer

一种方法是使用cumsum()和欺骗一点。例如，给定K：

K <- 1:10

为了简单起见，我要添加1（不是100）到K[1]，我们要生成：

> 1 + K[1]
[1] 2
> (1 + K[1]) + K[2]
[1] 4
> ((1 + K[1]) + K[2]) + K[3]
[1] 7
....

这是一个累积和。我们需要对要添加到第一个元素的常量进行一些欺骗，因为我们只希望它影响第一个元素，而不是添加到每个元素。因此这是错误的

> L <- cumsum(1 + K) 
> L
 [1]  2  5  9 14 20 27 35 44 54 65

我们真正想要的是：

> L <- cumsum(c(1, K))[-1]
> L
 [1]  2  4  7 11 16 22 29 37 46 56

其中，我们将常量连接到向量K作为第一个元素并应用于该向量cumsum()，但从中删除输出的第一个元素cumsum()。

这当然可以以稍微简单的方式完成：

> L <- 1 + cumsum(K)
> L
 [1]  2  4  7 11 16 22 29 37 46 56

即计算cumusum()然后添加常量（我现在看到的是@gd047 在他们的答案中建议的内容。）

score 0 · Accepted Answer

如 Paul Hiemstra 所示，内置函数cumsum()很快。但是 for 循环的解决方案可以通过使用编译器包来加速。

library(compiler)
fls_compiled <- cmpfun(for_loop_solution)

然后使用相同的数据让我们运行 benchmark如下

benchmark(for_loop_solution(sample_data), 
          cumsum_solution(sample_data),
          fls_compiled(sample_data), 
          replications = 100)
                            test replications elapsed relative user.self
2   cumsum_solution(sample_data)          100   0.013    1.000     0.013
3      fls_compiled(sample_data)          100   0.726   55.846     0.717
1 for_loop_solution(sample_data)          100   4.417  339.769     3.723
  sys.self user.child sys.child
2    0.000          0         0
3    0.006          0         0
1    0.031          0         0

所以尽可能使用内置函数。如果没有内置，请尝试编译器包。它通常提供更快的代码。

score 0 · Accepted Answer

下面显示了一个for基于循环的解决方案。就速度而言，这可能不是您想要的，其中矢量化函数cumsum要快得多。

a = 1:10
b = vector(mode = "numeric", length = length(a))
b[1] = 1 + a[1]

for(idx in 2:length(a)) {
  b[idx] = a[idx] + b[idx - 1]
}

一些时间安排：

require(rbenchmark)

for_loop_solution = function(a) {
    b = vector(mode = "numeric", length = length(a))
    b[1] = 1 + a[1]

    for(idx in 2:length(a)) {
      b[idx] = a[idx] + b[idx - 1]
    } 
    return(invisible(b))
}

cumsum_solution = function(a) {
   return(1 + cumsum(a))
}

sample_data = 1:10e3
benchmark(for_loop_solution(sample_data), 
          cumsum_solution(sample_data), 
          replications = 100)
                            test replications elapsed relative user.self
2   cumsum_solution(sample_data)          100   0.013    1.000     0.011
1 for_loop_solution(sample_data)          100   3.647  280.538     3.415
  sys.self user.child sys.child
2    0.002          0         0
1    0.006          0         0

这表明 usingcumsum比使用显式 for 循环快几百倍。当长度增加时，这种影响会更加明显sample_data。

r - 将一行中的值添加到R中的前一个值

4 回答 4

Related

Reference