parallel-processing - Map Reduce 或其他分布式/并行设计模式？

Question

我有这个代码可以连接/组合一组图像。我想将此顺序代码重组为并行/分布式应用程序，因为我的图像集合非常大（大数据:-)）。我正在考虑使用 Map/Reduce，但不确定这在 Map/Reduce 下是否可行。

#Sequential Code 
Result.Image <- NULL
foreach(Image in Image.Collection) {
  Result.Image <- CombineImage(Result.Image, Image)
}

注意：顺序无所谓；组合图像 1,2,3,4,5 与组合图像 2,3,1,4,5 一样好。

理想情况下，我想要这样的东西（看起来更像是一个经典的 divide-et-impera 而不是 map/reduce ）：

在此处输入图像描述

1,2,3,4 是原始图像。一个节点将图像#1 和图像#2 连接成一个称为图像#5 的新图像。第二个节点将图像#3 和图像#4 连接成图像#6，最后一个节点将图像#5 和图像#6 连接成最终结果。

关于我应该使用什么框架/并行或分布式设计模式来做这样的事情有什么想法吗？

干杯！！

score 0 · Accepted Answer

From your initial description (foreach code) seems that you cannot process image #3 until you have processed #1 and #2 since you accumulate intermediate results in the Result.Image. Now, your graph shows a different story, that sibling nodes can be processed in parallel, and I am wondering if even random nodes can be combined in parallel. Regardless, I think you can put all the initial images in a FIFO queue and throw at it as many processors (threads or machines or nodes) that you can afford. Each processor picks up two images, combines them and puts the result back in the queue. You process like this until you get 1 image in the queue.

parallel-processing - Map Reduce 或其他分布式/并行设计模式？

1 回答 1

Related

Reference