c# - 这对 GPU 性能进行 Sobel 过滤可以吗？

Question

我有一个与 CUDA 相关的问题要问你:)。由于我对使用 CUDA 比较陌生，我想知道这种“性能”是否还可以。

我正在使用 C# 和 Cudafy.Net！

我有一个灰度图像（表示为 float[]），它是从屏幕截图中计算出来的（图像大小为：1920x1018 像素）。

现在我使用在 GPU 上运行的 Sobel 过滤器（通过 Cudafy.Net），它看起来像这样：

    [Cudafy]
    public static void PenaltyKernel(GThread thread, Single[] data, Single[] res, Int32 width, Int32 height)
    {
        Single[] shared_data = thread.AllocateShared<Single>("shared_data", BLOCK_WIDTH * BLOCK_WIDTH);
        ///Map from threadIdx/BlockIdx to Pixel Position
        int x = thread.threadIdx.x - FILTER_WIDTH + thread.blockIdx.x * TILE_WIDTH;
        int y = thread.threadIdx.y - FILTER_WIDTH + thread.blockIdx.y * TILE_WIDTH;
        shared_data[thread.threadIdx.x + thread.threadIdx.y * BLOCK_WIDTH] = data[x + y * width];
        thread.SyncThreads();

        if (thread.threadIdx.x >= FILTER_WIDTH && thread.threadIdx.x < (BLOCK_WIDTH - FILTER_WIDTH) &&
            thread.threadIdx.y >= FILTER_WIDTH && thread.threadIdx.y < (BLOCK_WIDTH - FILTER_WIDTH))
        {
            ///Horizontal Filtering (detects horizontal Edges)
            Single diffHorizontal = 0;
            int idx = GetIndex(thread.threadIdx.x - 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffHorizontal -= shared_data[idx];
            idx++;
            diffHorizontal -= 2 * shared_data[idx];
            idx++;
            diffHorizontal -= shared_data[idx];
            idx += 2*BLOCK_WIDTH;
            diffHorizontal += shared_data[idx];
            idx++;
            diffHorizontal += 2 * shared_data[idx];
            idx++;
            diffHorizontal += shared_data[idx];

            ///Vertical Filtering (detects vertical Edges)
            Single diffVertical = 0;
            idx = GetIndex(thread.threadIdx.x - 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffVertical -= shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical -= 2 * shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical -= shared_data[idx];
            idx = GetIndex(thread.threadIdx.x + 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffVertical += shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical += 2 * shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical += shared_data[idx];

            ///Convert the "edgyness" for the Pixel and cut off at 1.0
            Single diff = GMath.Min(1.0f, GMath.Sqrt(diffHorizontal * diffHorizontal + diffVertical * diffVertical));

            ///Get the Array-Index
            idx = GetIndex(x, y, width);
            ///Set the Value
            res[x + y * width] = diff;
        }
    }

运行前设置的常量值：

TILE_WIDTH = 16;
FILTER_WIDTH = 1;
BLOCK_WIDTH = TILE_WIDTH + 2 * FILTER_WIDTH;

当我运行这个“PenaltyKernel”函数时，包括为数组分配内存、向设备复制数据和从设备复制数据，我的运行时间平均约为6.2 毫秒（使用 GTX 680 GT！）。

所以我现在的问题是，如果这个速度还可以（每秒大约 161 帧）或者我错过了什么？我的 Sobel 过滤器是否还可以（我的意思是，结果看起来不错 :)）？

任何帮助表示赞赏！

score 0 · Accepted Answer

我觉得这个速度还可以。大部分时间来自于在主机和设备之间复制数据（尤其是从 GPU 到 CPU 的传输很慢）。

关于速度的说明：一般来说，如果图像很小，GPU 上的图像处理可能比 CPU 上的慢（我没有测试过你的代码，所以我不知道你的情况是否如此）。但是，图像越大，在设备上的处理速度比在主机上的处理速度越快。

c# - 这对 GPU 性能进行 Sobel 过滤可以吗？

1 回答 1

Related

Reference