r - 用 R 中的值通过 for 循环填充 3D 矩阵

Question

我设置了一个大小为 365x7x4 的 3 维矩阵。

x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))

现在我将使用 for 循环为每个元素填充一个值。假设每个元素的值应该是行、列和深度的总和。我想这相对容易。

谢谢！最好的，F

score 7 · Accepted Answer

使用一个更简单的例子，这样我们就可以看到正在做什么

arr <- array(seq_len(3*3*3), dim = rep(3,3,3))

以下代码给出了请求的输出：

dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
arr[] <- rowSums(ind)

以上给出

> arr
, , 1

     [,1] [,2] [,3]
[1,]    3    4    5
[2,]    4    5    6
[3,]    5    6    7

, , 2

     [,1] [,2] [,3]
[1,]    4    5    6
[2,]    5    6    7
[3,]    6    7    8

, , 3

     [,1] [,2] [,3]
[1,]    5    6    7
[2,]    6    7    8
[3,]    7    8    9

> arr[1,1,1]
[1] 3
> arr[1,2,3]
[1] 6
> arr[3,3,3]
[1] 9

更新：在这里使用@TimP 的答案中的示例，我更新了答案，以展示如何以更像 R 的方式完成它。

给定

arr <- array(seq_len(3*3*3), dim = rep(3,3,3))

arr用i + j + kunless替换元素k > 2，在这种情况下j*k-i使用。

dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
## which k > 2
want <- ind[,3] > 2
arr[!want] <- rowSums(ind[!want, ])
arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]

虽然坚持使用循环等熟悉的习语很诱人，并且与流行的看法相反，循环在 R 中并不是低效的，但当你学习语言并开始将其应用于数据分析任务时，学习以向量化的方式思考将获得很多回报.

以下是 Fabian 示例的一些时间安排：

> x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+     for (j in seq_len(dim(x)[2])) {
+         for (k in seq_len(dim(x)[3])) {
+             val = i+j+k
+             if (k > 2) {
+                 val = j*k-i
+             }
+             x[i,j,k] = val
+         }
+     }
+ }
+ })
   user  system elapsed 
  0.043   0.000   0.044 
> arr <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ dims <- dim(arr)
+ ind <- expand.grid(lapply(dims, seq_len))
+ ## which k > 2
+ want <- ind[,3] > 2
+ arr[!want] <- rowSums(ind[!want, ])
+ arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
   user  system elapsed 
  0.005   0.000   0.006

对于更大的问题（至少对于我的 ickle 笔记本电脑！）

> x <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+     for (j in seq_len(dim(x)[2])) {
+         for (k in seq_len(dim(x)[3])) {
+             val = i+j+k
+             if (k > 2) {
+                 val = j*k-i
+             }
+             x[i,j,k] = val
+         }
+     }
+ }
+ })
   user  system elapsed 
 51.759   0.129  53.090
> arr <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+     dims <- dim(arr)
+     ind <- expand.grid(lapply(dims, seq_len))
+     ## which k > 2
+     want <- ind[,3] > 2
+     arr[!want] <- rowSums(ind[!want, ])
+     arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
   user  system elapsed 
  2.282   1.036   3.397

但即使按照今天的标准，这也可能是微不足道的。您可以看到，由于该方法所需的所有函数调用，循环开始变得越来越没有竞争力。

score 3 · Accepted Answer

Fabian: from the phrasing of your question, I believe you're just looking for a simple way of setting values in the array to follow any set of rules you might devise. No problem.

Your array is small (and from the context I strongly suspect you only want to use the code for something of that size). So good practice is simply to use a set of three for loops, which will run almost instantly - no need for any unnecessary complications. My code below shows an example: here we set element x[i,j,k] to be i+j+k, unless k>2, in which case we set it to be j*k-i instead.

Obviously, you can have as many rules as you want - just add an if statement for each one, and define val to be the value you want x[i,j,k] to take if that condition is true. (There's a few different ways to set this up, but this one seems the simplest to understand.) At the end of the innermost loop, x[i,j,k] gets set to the required value (val), and we then go on and do the next element until they're all done. That's it!

x = array(rep(1, 365*5*4), dim=c(365, 5, 4))

for (i in seq_len(dim(x)[1])) {
    for (j in seq_len(dim(x)[2])) {
        for (k in seq_len(dim(x)[3])) {
            val = i+j+k
            if (k > 2) {
                val = j*k-i
            }
            x[i,j,k] = val
        }
    }
}

Hope this helps :)

Quick update (non-loopy method): For completeness, if you're in a real hurry and want your code to run in 0.07 seconds rather than 0.19 seconds... you could also set things up in a vectory way like this:

comb = expand.grid(seq_len(365), seq_len(5), seq_len(4))
i = comb$Var1; j = comb$Var2; k = comb$Var3
val = i+j+k
subs = which(k>2); val[subs] = (j*k-i)[subs]
x = array(val, dim = c(365, 5, 4))

In the above, the variables i, j and k are vectors with length 7300 (the number of cells in the array). As before, the default choice for val is the sum i+j+k except on the subset k>2, where val is j*k-i instead - exactly the same as the example in the first part of my answer. Obviously the notation in this method is quite a bit harder which is why I thought it'd be better to show you the loop-based solution. Hopefully you'll see how you could add in other conditions to the above though. The final line maps the vector val over to the array x in the right way so that each x[i,j,k] takes on the correct value of val. Try it and see :)

One small point to note though: if you were ever to want to run this sort of algorithm on a massive array (much, much, much bigger than the one you have now), then the approach immediately above would definitely be the one to use to minimise runtime. For your case, my advice is to use whichever one you feel more comfortable with as the runtime isn't really an issue.

Cheers! :)

r - 用 R 中的值通过 for 循环填充 3D 矩阵

2 回答 2

Related

Reference