cuda - nppiFilter 中断输出图像

Question

我用 NPP 写了一个 BoxFilter 的例子，但是输出的图像看起来坏了。这是我的代码：

#include <stdio.h>
#include <string.h>

#include <ImagesCPU.h>
#include <ImagesNPP.h>
#include <Exceptions.h>

#include <npp.h>
#include "utils.h"


void boxfilter1_transform( Npp8u *data, int width, int height ){
    size_t size = width * height * 4;

    // declare a host image object for an 8-bit RGBA image
    npp::ImageCPU_8u_C4 oHostSrc(width, height);

    Npp8u *nDstData = oHostSrc.data();
    memcpy(nDstData, data, size * sizeof(Npp8u));

    // declare a device image and copy construct from the host image,
    // i.e. upload host to device
    npp::ImageNPP_8u_C4 oDeviceSrc(oHostSrc);

    // create struct with box-filter mask size
    NppiSize oMaskSize = {3, 3};

    // Allocate memory for pKernel
    Npp32s hostKernel[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
    Npp32s *pKernel;

    checkCudaErrors( cudaMalloc((void**)&pKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s)) );
    checkCudaErrors( cudaMemcpy(pKernel, hostKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s),
                                cudaMemcpyHostToDevice) );

    Npp32s nDivisor = 9;

    // create struct with ROI size given the current mask
    NppiSize oSizeROI = {oDeviceSrc.width() - oMaskSize.width + 1, oDeviceSrc.height() - oMaskSize.height + 1};
    // allocate device image of appropriatedly reduced size
    npp::ImageNPP_8u_C4 oDeviceDst(oSizeROI.width, oSizeROI.height);
    // set anchor point inside the mask
    NppiPoint oAnchor = {2, 2};

    // run box filter
    NppStatus eStatusNPP;
    eStatusNPP = nppiFilter_8u_C4R(oDeviceSrc.data(), oDeviceSrc.pitch(),
                                   oDeviceDst.data(), oDeviceDst.pitch(),
                                   oSizeROI, pKernel, oMaskSize, oAnchor, nDivisor);
    //printf("NppiFilter error status %d\n", eStatusNPP);
    NPP_DEBUG_ASSERT(NPP_NO_ERROR == eStatusNPP);

    // declare a host image for the result
    npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
    // and copy the device result data into it
    oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
    memcpy(data, oHostDst.data(), size * sizeof(Npp8u));

    return;
}

大部分代码是从示例 boxFilterNPP.cpp 复制而来的。和输出图像：http: //img153.imageshack.us/img153/7716/o8z.png

为什么会这样？

score 3 · Accepted Answer

你有一个跨步问题。更改此行：

npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());

对此：

npp::ImageCPU_8u_C4 oHostDst(oDeviceSrc.size());

怎么了？

假设您的输入图像是 600x450。

oHostSrc为 600 x 450，间距为 600x4 = 2400。
memcpy从data到是可以的oHostSrc，因为它们具有相同的宽度和间距。
oDeviceSrc从oHostSrcc(600x450)获取尺寸
oDeviceDst略小于，oDeviceSrc因为它只获取 ROI 的大小，所以它类似于 596x446。
您的代码创建oHostDst的大小与相同oDeviceDst，因此大约为 596x446。
该.copyTo操作将 oDeviceDst (pitched) 596x446 图像复制到 (unpitched) oHostDst，也是 596x446。
最后memcpy会破坏图像，因为它将 596x446oHostDst图像复制到 600x450data区域。

解决方案是创建oHostDst600x450 并让.copyTo操作处理线条大小和间距的差异。

原始代码没有这个问题，因为在该代码的任何地方都没有未调整的副本（例如，没有使用 raw memcpy）。只要您在每个复制步骤中明确处理源和目标的间距和宽度，您将最终图像创建为 600x450 还是 596x446 都没有关系。但是您的最终memcpy操作并未明确处理音高和宽度，而是隐含地假设源和目标的大小相同，但事实并非如此。

cuda - nppiFilter 中断输出图像

1 回答 1

Related

Reference