c++ - std::sort 与英特尔 ipp 排序性能。我究竟做错了什么？

Question

我正在尝试比较std::sort（使用std::vector结构）与英特尔ipp排序的性能。

我在英特尔至强处理器上运行此测试model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

我正在对长度为 20000 个元素的向量进行排序并排序 200 次。我尝试了 2 种不同ipp的排序例程，即。ippsSortDescend_64f_I和ippsSortRadixDescend_64f_I。在所有情况下，ippsort 至少比 . 慢 5 到 10 倍std::sort。我预计ipp较小数组的排序可能会更慢，但否则它通常应该比std::sort. 我在这里错过了什么吗？我究竟做错了什么？

std::sort在我的所有测试用例中始终更快。

这是我的程序

#include <array>
#include <iostream>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timeb.h>

#include <vector>
#include <chrono>

#include "ipp.h"

using namespace std;

const int SIZE = 2000000;
const int ITERS = 200;

//Chrono typedefs
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::microseconds microseconds;

//////////////////////////////////// std ///////////////////////////////////

typedef vector<double> myList;

void initialize(myList & l, Ipp64f* ptr)
{
    double randomNum;
    for (int i = 0; i < SIZE; i++)
    {
        randomNum =  1.0 * rand() / (RAND_MAX / 2) - 1;
        l.push_back(randomNum);
        ptr[i] = randomNum;
    }
}


void test_sort()
{
        array<myList, ITERS> list;
        array<Ipp64f*, ITERS> ippList;

        // allocate
        for(int i=0; i<ITERS;i++)
        {
                list[i].reserve(SIZE);
                ippList[i] = ippsMalloc_64f(SIZE);
        }

        // initialize
        for(int i=0;i<ITERS;i++)
        {
                initialize(list[i], ippList[i]);
        }

        cout << "\n\nTest Case 1: std::sort\n";
        cout << "========================\n";

        // sort vector
        Clock::time_point t0 = Clock::now();
        for(int i=0; i<ITERS;i++)
        {
            std::sort(list[i].begin(), list[i].end());
        }
        Clock::time_point t1 = Clock::now();
        microseconds ms = std::chrono::duration_cast<microseconds>(t1 - t0);
        std::cout << ms.count() << " micros" << std::endl;

        ////////////////////////////////// IPP ////////////////////////////////////////

        cout << "\n\nTest Case 2: ipp::sort\n";
        cout << "========================\n";

        // sort ipp 
        Clock::time_point t2 = Clock::now();
        for(int i=0; i<ITERS;i++)
        {
                ippsSortAscend_64f_I(ippList[i], SIZE);
        }
        Clock::time_point t3 = Clock::now();
        microseconds ms1 = std::chrono::duration_cast<microseconds>(t3 - t2);
        std::cout << ms1.count() << " micros" << std::endl;

        for(int i=0; i<ITERS;i++)
        {
          ippsFree( ippList[i] );
        }
}


///////////////////////////////////////////////////////////////////////////////////////

int main()
{
    srand (time(NULL));

    cout << "Test for sorting an array of structures.\n" << endl;
    cout << "Test case: \nSort an array of structs ("<<ITERS<<" iterations) with double of length "<<SIZE<<". \n";
        IppStatus status=ippInit();
        test_sort();
    return 0;
}

/////////////////////////////////////////////////////////////////////////////

编译命令为：

/share/intel/bin/icc -O2 -I$(IPPROOT)/include  sorting.cpp -lrt -L$(IPPROOT)/lib/intel64 -lippi -lipps -lippvm -lippcore -std=c++0x

程序输出：

Test for sorting an array of structures.

Test case:
Sort an array of structs (200 iterations) with double of length 2000000.


Test Case 1: std::sort
========================
38117024 micros


Test Case 2: ipp::sort
========================
48917686 micros

score 5 · Accepted Answer

我已经在我的电脑（Core i7 860）上运行了你的代码。

 std::sort 32,763,268 (~33s)

 ippsSortAscend_64f_I 34,217,517 (~34s)

 ippsSortRadixAscend_64f_I 15,319,053 (~15s)

这些是预期的结果。std::sort 是内联且高度优化的，而 ippsSort_* 具有函数调用开销和所有 ipp 函数执行的大量内部检查。这应该可以解释 ippsSortAscend 函数的缓慢下降。基数排序仍然比预期的快两倍，因为它不是基于比较的排序。

score 1 · Accepted Answer

为了获得更准确的结果，您需要

比较完全相同的随机数分布的排序；
从时间中删除随机化；
使用 ippsSort*32f 函数，在 IPP 情况下对“float”（不是“double”）进行排序。

score -1 · Accepted Answer

-1

我猜你忘记在测量之前调用 ippInit()

于 2015-08-27T13:12:00.330 回答

c++ - std::sort 与英特尔 ipp 排序性能。我究竟做错了什么？

3 回答 3

Related

Reference