我有一些串行代码已经开始使用英特尔的 TBB 进行并行化。我的第一个目标是并行化代码中几乎所有的 for 循环(我什至在 for 循环内并行化了 for 循环)并且现在已经完成了一些加速。我正在寻找更多的地方/想法/选项来并行化......我知道这可能听起来有点模糊,没有太多参考问题,但我在这里寻找可以在我的代码中探索的通用想法。
算法概述(以下算法在图像的所有级别上运行,从最短开始,每次将宽度和高度增加 2,直到达到实际高度和宽度)。
For all image pairs starting with the smallest pair
For height = 2 to image_height - 2
Create a 5 by image_width ROI of both left and right images.
For width = 2 to image_width - 2
Create a 5 by 5 window of the left ROI centered around width and find best match in the right ROI using NCC
Create a 5 by 5 window of the right ROI centered around width and find best match in the left ROI using NCC
Disparity = current_width - best match
The edge pixels that did not receive a disparity gets the disparity of its neighbors
For height = 0 to image_height
For width = 0 to image_width
Check smoothness, uniqueness and order constraints*(parallelized separately)
For height = 0 to image_height
For width = 0 to image_width
For disparity that failed constraints, use the average disparity of
neighbors that passed the constraints
Normalize all disparity and output to screen