3

I'm aware of the fact that Amelia R package provides some support for parallel multiple imputation (MI). However, preliminary analysis of my study's data revealed that the data is not multivariate normal, so, unfortunately, I can't use Amelia. Consequently, I've switched to using mice R package for MI, as this package can perform MI on data that is not multivariate normal.

Since the MI process via mice is very slow (currently I'm using AWS m3.large 2-core instance), I've started wondering whether it's possible to parallelize the procedure to save processing time. Based on my review of mice documentation and the corresponding JSS paper, as well as mice's source code, it appears that currently the package doesn't support parallel operations. This is sad, because IMHO the MICE algorithm is naturally parallel and, thus, its parallel implementation should be relatively easy and it would result in a significant economy in both time and resources.

Question: Has anyone tried to parallelize MI in mice package, either externally (via R parallel facilities), or internally (by modifying the source code) and what are results, if any? Thank you!

4

1 回答 1

2

最近,我尝试通过外部包来并行化多重插补(MI)mice,也就是说,通过使用R多处理设施,特别是基础分发parallel标准的包。R基本上,解决方案是使用mclapply()函数来分配所需 MI 迭代总数的预先计算的份额,然后将生成的估算数据组合到单个对象中。性能方面,这种方法的结果超出了我最乐观的预期:处理时间从 1.5 小时减少到7 分钟以下(!)。那只是在两个核心上。我已经删除了一个多级因素,但它应该不会有太大影响。无论如何,结果令人难以置信!

于 2014-10-02T03:44:29.243 回答