1

我有一个很大的对象列表(比如 100k 个元素)。每个元素都必须由一个“进程”函数处理,但我想分块进行处理......例如,20 次通过,因为我想将处理结果保存到硬盘驱动器文件中并保持操作内存空闲。

我是 R 新手,我知道它应该涉及一些应用魔法,但我不知道该怎么做(还)。

任何指导将不胜感激。

一个小例子:

 objects <- list();
 for (i in 1:100){
 objects <- append(objects, 500);
 }
 objects;





 processOneElement <- function(x){
 x/20 + 23;
 }

我想一次处理前 20 个元素并保存结果,然后再处理第二个 20 个元素并保存结果......等等

objects <- list();
 for (i in 1:100){
 objects <- append(objects, 500);
 }
 objects;

process <- function(x){
 x/20 + 23;
 }

results <- lapply(objects, FUN=process)



index <- seq(1, length(objects), by=20);
lapply(index, function(idx1) {
idx2 <- min(idx1+20-1, length(objects));
batch <- lapply(idx:idx2, function(x) {
process(objects[[x]]);
})

write.table(batch, paste("batch", idx1, sep=""));
})
4

1 回答 1

2

有了你给出的答案,这就是我可以建议的答案。假设您的列表存储在list.object,

lapply(seq(1, length(list.object), by=20), function(idx) {
    # here idx will be 1, 21, 41 etc...
    idx2 <- min(idx+20-1, length(list.object))
    # do what you want here.. 
    batch.20.processed <- lapply(idx:idx2, function(x) {
        process(list.object[[x]]) # passes idx:idx2 indices one at a time
    })
    # here you have processed list with 20 elements
    # finally write to file
    lapply(1:20, function(x) {
        write.table(batch.20.processed[[x]], ...)
        # where "..." is all other allowed arguments to write.table
        # such as row.names, col.names, quote etc.
        # don't literally pass "..." to write.table
    })
}
于 2013-01-23T15:14:04.433 回答