r - R: Lapply equivalent for Revoscaler/Revolution Enterprise?

Question

Have Revolution Enterprise. Want to run 2 simple but computationally intensive operations on each of 121k files in a directory, outputting to new files. Was hoping to use some Revoscaler function that chunked/parallel processed the data similarly to lapply. So I'd have lapply(list of files, function), but using a faster Rxdf (revoscaler) function that might actually finish, since I suspect basic lapply would never complete.

So is there a Revoscaler version of lapply? Will running it from Revolution Enterprise automatically chunk things?

I see parlapply, mclapply (http://www.inside-r.org/r-doc/parallel/clusterApply)...can I run these using cores on the same desktop? Aws servers? Do I get anything out of running these packages in Revoscaler if its not a native Rxdf function? I guess then that this is a question more on what I can use as a "cluster" in this situation.

score 3 · Accepted Answer

有,在单核场景中的rxExec行为类似于在多核/多进程场景中的行为。你会像这样使用它：lapplyparLapply

# vector of file names to operate on
files <- list.files()

rxSetComputeContext("localpar")
rxExec(function(fname) { 
    ...
}, fname=rxElemArg(files))

这里，func是对文件执行你想要的操作的函数；你把它传递给rxExec你想要的lapply。该rxElemArg函数告诉对每个不同的值rxExec执行。将计算上下文设置为启动本地从属进程集群，因此操作将并行运行。默认情况下，从属设备的数量为 4，但您可以使用.funcfiles"localpar"rxOptions(numCoresToUse)

您期望获得多少加速？这取决于你的数据。如果您的文件很小并且大部分时间都被计算占用，那么并行处理可以让您获得很大的加速。但是，如果您的文件很大，那么您可能会遇到 I/O 瓶颈，尤其是当所有文件都在同一个硬盘上时。

r - R: Lapply equivalent for Revoscaler/Revolution Enterprise?

1 回答 1

Related

Reference