r - R for 循环中的 is.na() 不太了解

Question

我is.na()对 R 中 for 循环中的行为感到困惑。

我正在尝试创建一个函数来创建一个数字序列，对矩阵做一些事情，根据数字序列总结生成的矩阵，然后根据总结修改数字序列并重复。我制作了一个简单版本的函数，因为我认为它仍然解决了我的问题。

library(plyr)

test <- function(desired.iterations, max.iterations)
{
    rich.seq <- 4:34 ##make a sequence of numbers
    details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq)) 
    ##generate a table where the row names are those numbers
    print(details.table) ##that's what it looks like
    temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10)) 
     ##generate some sample data to summarize and fill into details.table
    temp.results[,1] <- rep(5:6, 5)
    temp.results[,2] <- rnorm(10)
    print(temp.results) ##that's what it looks like
    details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <- 
                                                       count(temp.results[,1])$freq  
    ##summarize, subset to the appropriate rows in details.table, and fill in the summary
    print(details.table)
    for (i in 1:max.iterations)
    {
       rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)] 
        ## the idea would be to keep cutting this sequence of numbers down with 
        ##   successive iterations until the desired number of iterations per row in 
        ## details.table was reached. in other words, in the real code i'd do 
        ## something to details.table in the next line
        print(rich.seq)
    }
}

##call the function
test(desired.iterations=4, max.iterations=2)

在第一次运行 for 循环时，rich.seq 看起来就像我期望的那样，其中 5 和 6 不再在序列中，因为两者都以超过 4 次迭代结束。然而，在第二次运行时，它吐出了一些意想不到的东西。

更新

感谢您的帮助，也很抱歉。在重新阅读了我的原始帖子之后，它不仅不太清楚，而且我还没有意识到 count 是 plyr 包的一部分，我在我的完整功能中调用了它，但没有在这里调用。我会尝试更好地解释。

我现在工作的是一个函数，它接受一个矩阵，随机化它（以多种不同的方式），然后计算一些统计数据。这些统计信息临时存储在一个表中——temp.results——其中 temp.results[,1] 是每列中非零元素的总和，而 temp.results[,2] 是不同的汇总统计信息柱子。我将这些结果保存到一个 csv 文件中（并在后续迭代中将它们附加到同一个文件中），因为循环遍历它和 rbinding 会占用大量内存。

问题是某些列总和 (temp.results[,1]) 的采样频率非常低。为了对这些进行充分采样，需要多次迭代，而生成的 .csv 文件将延伸到数百 GB。

我想要做的是在每次迭代时创建并更新一个表（details.table），以跟踪每列总和实际被采样的次数。当表中的给定元素达到desired.iterations 时，我希望将其从向量rich.seq 中排除，以便只有尚未收到desired.iterations 的列实际保存到csv 文件中。max.iterations 参数将在 break() 语句中使用，以防事情花费太长时间。

因此，我在示例案例中所期望的是对于两次迭代的 rich.seq 完全相同的行，因为我实际上并没有做任何改变它。我相信 flodel 绝对正确，因为我的问题在于比较长度比rich.seq 长的矩阵（details.table），导致意想不到的结果。但是，我不希望 details.table 的尺寸发生变化。当我在 for 循环中重新定义 rich.seq 时，也许我可以以某种方式解决实现 %in% 的问题？

score 1 · Accepted Answer

感谢弗洛德尔让我走上正轨。它与 is.na 无关，而是与我正在比较的向量的长度有关。

也就是说，我将 details.table 的初始值设置为零，以避免增加 is.na 语句的复杂性。

此代码有效，并且可以修改以执行我上面描述的操作。

图书馆（plyr）

test <- function(desired.iterations, max.iterations)
{
    rich.seq <- 4:34 ##make a sequence of numbers
    details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq)) ##generate a table where the row names are those numbers
    details.table[,1] <- 0
    print(details.table) ##that's what it looks like
    temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10)) ##generate some sample data to summarize and fill into details.table
    temp.results[,1] <- rep(5:6, 5)
    temp.results[,2] <- rnorm(10)
    print(temp.results) ##that's what it looks like
    details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <- count(temp.results[,1])$freq ##summarize, subset to the appropriate rows in details.table, and fill in the summary
    print(details.table)
    for (i in 1:max.iterations)
    {
        rich.seq <- row.names(details.table)[details.table[,1] < desired.iterations]
        print(rich.seq)
    }
}

我没有尝试减少rich.seq，而是根据上一次迭代期间details.table 发生的任何情况重新定义它。

score 1 · Accepted Answer

我同意你应该改进你的问题。但是，我想我可以发现出了什么问题。

你在循环details.table之前计算。它是一个与第一次初始化时for长度相同的矩阵（，即）。rich.seqlength(4:34)31

在for循环内部，details.table < desired.iterations | is.na(details.table)是一个长度为的逻辑向量31。在第一次循环迭代中，

rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)]

将导致的长度减少rich.seq。但是在第二次循环迭代中，除非details.table重新定义（不是这种情况），否则您试图rich.seq通过长度大于的逻辑向量进行子集化rich.seq。这肯定会导致意想不到的结果。

您可能打算重新定义details.table为for循环的一部分。

（我也很惊讶你从未使用过temp.results[,2]。）

r - R for 循环中的 is.na() 不太了解

2 回答 2

Related

Reference