0

我正在尝试使用 ffbase 在循环中对一个非常大的 ffdf 对象进行子集化,但我收到了错误消息:

Error in UseMethod("as.hi") : no applicable method for 'as.hi' applied to an object of
class "NULL"

我在具有大量可用内存的 ssh 上运行此代码。这是我要运行的代码:

# totalD is an ffdf with columns ID, TS, and TD, each with 288,133,589 rows. ID consists
# of integers. TS is a column of integer timestamps with second precision. TD is of type
# double. Uid3 is an integer vector consisting of the 1205 unique entries of totalD$ID.

# H_times creates a matrix of the sum of the entries in TD traveled in each hour
H_times <- function(totalD, Uid3) {

    # hours is the number of unique hours of the experiment
    hours <- length(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))-1

    # bH is used as a counter in a the following loops
    bH <- min(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))

    # sum_D_matrix is the output
    sum_D_matrix <- matrix(0, nrow = hours, ncol = length(Uid3))

    for(i in 1:length(Uid3)) {
        Bh <- bH
        for(j in 1:hours) {
            sum_D_matrix[j,i] <- sum(subset(totalD$TD, totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))
            Bh <- Bh + 3600
        }
    }
    save(sum_D_matrix, file = "sum_D_matrix)
}

H_times(totalD, Uid3)

我试图实施 jwijffels 在这个问题的评论中建议的修复,但无济于事。提前致谢!

4

1 回答 1

0

这是由以下行引起的:

sum_D_matrix[j,i] <- sum(subset(totalD$TD, 
    totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))

选择可以为空的地方。问题之一ff是它无法处理空向量。向量/ 的大小ffdf应始终 >= 1。也许这应该由subset.ff. 然而,什么subset.ff应该返回尚不清楚。

您可以使用以下解决方法:

sel <- totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]
sel <- ffwhich(sel, sel)
if (is.null(sel)) {
  sum_D_matrix[j,i] <- 0
} else {
  sum_D_matrix[j,i] <- sum(totalD$TD[sel])
}

ffwhich当结果向量为空时返回NULL(正如我提到的,它不能返回长度为 0 的向量)。

边注

您使用子集的方式实际上有点奇怪。使用的原因之一subset是通过去掉所有totalD$. 更“常用”的使用方式是:

sum_D_matrix[j,i] <- sum(subset(totalD, TS >= Bh & TS < (Bh + 3600) & ID == Uid3[i], 
    TD, drop=TRUE))
于 2014-07-03T07:29:32.217 回答