c++ - 工作线程的性能比主线程差得多

Question

我有一个与我的库一起编译的小型测试程序，用于在使用不同方法（SSE、for-loop、unrolled-loop 等）时测试各种数学函数的速度。这些测试在不同的方法上运行了数十万次，并计算出平均计算时间。我决定为我的计算机的每个核心创建 4 个工作线程，并以这种方式为我的测试运行基准测试。

现在这些是微基准，以纳秒为单位测量，因此差异可能看起来很大，但实际上在该级别上没有其他类型的差异。

这是我以单线程方式运行函数的代码：

static constexpr std::size_t num_tests = 400000;
auto do_test = [=](uint64_t(*test)()){
    // test is a function that returns nanosecods taken for a specific method
    uint64_t accum = 0;
    for(std::size_t n = 0; n < num_tests; n++)
        accum += test();
    return accum / num_tests;
};

这是我的（更快的）代码，用于以多线程方式运行测试：

static constexpr std::size_t num_tests = 100000;
auto do_test = [=](uint64_t(*test)()){
    uint64_t accum = 0;

    std::thread first([&](){
        for(std::size_t n = 0; n < num_tests; n++)
            accum += test();
    });
    std::thread second([&](){
        for(std::size_t n = 0; n < num_tests; n++)
            accum += test();
    });
    std::thread third([&](){
        for(std::size_t n = 0; n < num_tests; n++)
            accum += test();
    });
    std::thread fourth([&](){
        for(std::size_t n = 0; n < num_tests; n++)
            accum += test();
    });

    first.join();
    second.join();
    third.join();
    fourth.join();

    return accum / (num_tests * 4);
};

但结果较慢 D: 所以它执行得更快，但操作给出的结果较慢。

我的单线程版本平均为 77 纳秒，而我的多线程版本平均为 150 纳秒！

为什么会这样？

PS我知道这是一个微小的差异，我只是觉得这很有趣。

c++ - 工作线程的性能比主线程差得多

0 回答 0

Related

Reference