image-processing - 为什么 GPU 在 opencv SURF 算法中没有比 CPU 显示出优势？

Question

我想使用 GPU 来加速 SURF 算法。但实际上我发现 CPU（enale TBB）比 SURF 算法的 GPU 更快。 我的硬件和操作系统信息： CPU：Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz（4 核 + 8 线程） GPU：Nvidia GTX 660ti ~1000MHz（1344 GPU 核）ubuntu 12.04（64 位）

应用场景： 我的文件夹有大约 120 张图像。我需要使用 SURF 获取每个图像的关键点。

时间日志

每个图像的 CPU(TBB)，花费时间日志：

索引 DB:/home/ole/MatchServer/ImgDB0/img0 在 SURF ALGO (ON TBB)[s] 上花费时间：0.00666648

索引 DB:/home/ole/MatchServer/ImgDB0/img1 花费时间 onSURF ALGO (ON TBB)[s]: 0.00803925

索引 DB:/home/ole/MatchServer/ImgDB0/img2 在 SURF ALGO (ON TBB)[s] 上花费时间：0.0066344

索引 DB:/home/ole/MatchServer/ImgDB0/img3 在 SURF ALGO (ON TBB)[s] 上花费时间：0.00625698

索引 DB:/home/ole/MatchServer/ImgDB0/img4 在 SURF ALGO (ON TBB)[s] 上花费时间：0.00699448

索引 DB:/home/ole/MatchServer/ImgDB0/img5 在 SURF ALGO (ON TBB)[s] 上花费时间：0.00621663

        .................more..................................

每个图像的 GPU，花费时间日志（每个图像的 GPU 有 2 行日志，一个是上传 img 到 GPU Mem，第二个是 SURF_GPU 算法花费时间）：

索引 DB:/home/ole/MatchServer/ImgDB0/img0 GPU 上传图像的成本时间 [s]: 1.99329

索引 DB:/home/ole/MatchServer/ImgDB0/img0 在 Gpu SURF ALGO[s] 上花费时间：0.00971809

索引 DB:/home/ole/MatchServer/ImgDB0/img1 GPU 上传图像的成本时间 [s]: 0.000157638

索引 DB:/home/ole/MatchServer/ImgDB0/img1 在 Gpu SURF ALGO[s] 上花费时间：0.00618778

索引 DB:/home/ole/MatchServer/ImgDB0/img2 在 GPU 上传图像上花费时间 [s]: 8.8108e-05

索引 DB:/home/ole/MatchServer/ImgDB0/img2 在 Gpu SURF ALGO[s] 上花费时间：0.00736609

索引 DB:/home/ole/MatchServer/ImgDB0/img3 GPU 上传图像的成本时间 [s]: 8.8599e-05

索引 DB:/home/ole/MatchServer/ImgDB0/img3 在 Gpu SURF ALGO[s] 上花费时间：0.00559131

索引 DB:/home/ole/MatchServer/ImgDB0/img4 GPU 上传图像的成本时间 [s]: 8.7626e-05

索引 DB:/home/ole/MatchServer/ImgDB0/img4 在 Gpu SURF ALGO[s] 上花费时间：0.00610033

索引 DB:/home/ole/MatchServer/ImgDB0/img5 GPU 上传图像的成本时间 [s]: 8.9125e-05

索引 DB:/home/ole/MatchServer/ImgDB0/img5 在 Gpu SURF ALGO[s] 上花费时间：0.00632997

      ............................more..................................

我发现将图像垫上传到 GPU 大约 2 秒的第一张图像非常慢。接下来是正常的大约 0.000157638 秒。

显卡代码：

    try
    {
        double t0 = (double)getTickCount();
        cv::gpu::SURF_GPU surf_gpu;
        Size size = help_img.size();
        Size size0 = size;
        int type = help_img.type();
        cv::gpu::GpuMat d_m(size0, type);
        if(size0 != help_img.size() )
            d_m = d_m(Rect((size0.width - size.width) / 2, (size0.height - size.height) / 2, size.width, size.height));
        d_m.upload(help_img);
        double t = ((double)getTickCount() - t0)/getTickFrequency();
        std::cout << "indexing DB:"<< path << " cost time on upload image[s]: " << t << std::endl;

        t0 = (double)getTickCount();
        surf_gpu(d_m, cv::gpu::GpuMat(), help_keypoints);
        t = ((double)getTickCount() - t0)/getTickFrequency();
        std::cout << "indexing DB:"<< path << " cost time on Gpu image[s]: " << t << std::endl;
    }
    catch (const cv::Exception& e)
    {
       printf("issue happen!");
    }

请帮忙就以下问题提出一些建议：

1.为什么第一次上传图片到GPU很慢2秒左右？

2.为什么GPU不能加速SURF算法，SURF有很多计算，理论上GPU可以加速。

3. SURF算法如何提高GPU性能？

谢谢！！

score 3 · Accepted Answer

第一次上传到 GPU 总是会比较慢。GPU 需要进行初始化才能进行一些实际工作。这是因为默认 CUDA 上下文是在第一次 CUDA 调用时创建的，在您的情况下，它是上传到 GPU Mat。一种解决方法是在执行实际工作之前调用随机 GPU 函数。
这取决于您要比较的 GPU 和 CPU。使用 TBB 时，您使用的 XEON 等高端 CPU 更有可能获胜。对于实际加速，请尝试使用 NVIDIA Tesla 等高端 GPU。OpenCV 的当前实现可能没有针对您正在使用的 Kepler 架构 GPU 进行优化。
对此没有固定的答案。它取决于算法的并行性、最佳实现以及系统中存在的硬件。

image-processing - 为什么 GPU 在 opencv SURF 算法中没有比 CPU 显示出优势？

1 回答 1

Related

Reference