c++ - 使用 std::thread 和良好实践并行化循环

Question

可能重复：
C++ 2011：std::thread：并行循环的简单示例？

考虑以下将计算分布在向量元素上的程序（我以前从未使用过 std::thread ）：

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>

// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}

// Main
int main(int argc, char* argv[]) {

    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);

    // Initialization
    std::iota(v.begin(), v.end(), 0);

    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }

    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?

    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

代码中已经嵌入了两个问题。

第三个问题是：这段代码完全没问题，还是可以使用 std::threads 以更优雅的方式编写？我不知道使用 std::thread 的“良好做法”...

score 0 · Accepted Answer

关于第一个问题，如何计算每个线程要计算的范围：我提取了常量并给它们命名，以使代码更易于阅读。对于良好的实践，我还使用了一个lambda，它使代码更容易修改 - lambda 中的代码只会在这里使用，而该函数f可以在整个程序的其他代码中使用。利用这一点将代码的共享部分放在一个函数中，并专门化在 lambda 中只使用一次的代码。

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

高级用例将使用线程池，但其外观取决于您的应用程序设计，而 STL 并未涵盖。有关线程模型的一个很好的示例，请参见Qt 框架。如果您刚刚开始使用线程，请将其保存以备后用。

第二个问题已经在评论中回答了。该std::thread::join函数将等待（阻塞）直到线程完成。通过在每个线程上调用 join 函数并到达 join 函数之后的代码，您可以确定所有线程都已完成并且现在可以删除。

c++ - 使用 std::thread 和良好实践并行化循环

1 回答 1

Related

Reference