c++ - 如何限制在 C++ AMP 中执行操作的线程数

Question

我正在使用 C++ AMP 对大量线程执行一系列计算。计算的最后一步是修剪结果，但仅限于有限数量的线程。例如，如果计算结果低于阈值，则将结果设置为 0，但仅对最多 X 个线程执行此操作。本质上这是一个共享计数器，也是一个共享条件检查。

任何帮助表示赞赏！

score 1 · Accepted Answer

我对您的问题的理解是每个线程执行的以下伪代码：

auto result = ...
if(result < global_threshold)  // if the result of the calculation is below a threshold
    if(global_counter++ < global_max)  // for a maximum of X threads
        result = 0;  // then set the result to 0 
store(result);

然后我进一步假设在计算过程中两者global_threshold都global_max不会改变（即在parallel_for_each开始和结束之间） - 所以传递它们的最优雅的方法是通过 lambda 捕获。

另一方面，global_counter显然改变了值，所以它必须位于所有线程共享的可修改内存中，实际上是array<T,N>or array_view<T,N>。由于增加此对象的线程不同步，因此需要使用原子操作执行操作。

以上转换为以下 C++ AMP 代码（我使用的是 Visual Studio 2013 语法，但它很容易向后移植到 Visual Studio 2012）：

std::vector<int> result_storage(1024);
array_view<int> av_result{ result_storage };

int global_counter_storage[1] = { 0 };
array_view<int> global_counter{ global_counter_storage };

int global_threshold = 42;
int global_max = 3;

parallel_for_each(av_result.extent, [=](index<1> idx) restrict(amp)
{
    int result = (idx[0] % 50) + 1; // 1 .. 50
    if(result < global_threshold)
    {
        // assuming less than INT_MAX threads will enter here
        if(atomic_fetch_inc(&global_counter[0]) < global_max)
        {
            result = 0;
        }
    }
    av_result[idx] = result;
});

av_result.synchronize();

auto zeros = count(begin(result_storage), end(result_storage), 0);
std::cout << "Total number of zeros in results: " << zeros << std::endl
    << "Total number of threads lower than threshold: " << global_counter[0]
    << std::endl;

c++ - 如何限制在 C++ AMP 中执行操作的线程数

1 回答 1

Related

Reference