4

这是我的第一个问题,所以我会尽量详细。我正在努力在 CUDA 6.5 中实现降噪算法。我的代码基于这个 Matlab 实现: http: //pastebin.com/HLVq48C1
我很想使用新的 cuFFT 设备回调功能,但我被困在cufftXtSetCallback上。每次我的 cufftResult 都是CUFFT_NOT_IMPLEMENTED (14)。即使 nVidia 提供的示例也以同样的方式失败......我的设备回调测试代码:

__device__ void noiseStampCallback(void *dataOut,
                                size_t offset,
                                cufftComplex element,
                                void *callerInfo,
                                void *sharedPointer) {
    element.x = offset;
    element.y = 2;
    ((cufftComplex*)dataOut)[offset] = element;
}
__device__ cufftCallbackStoreC noiseStampCallbackPtr = noiseStampCallback;

我的代码的CUDA部分:

cufftHandle forwardFFTPlan;//RtC
//find how many windows there are
int batch = targetFile->getNbrOfNoiseWindows();
size_t worksize;

cufftCreate(&forwardFFTPlan);
cufftMakePlan1d(forwardFFTPlan, WINDOW, CUFFT_R2C, batch, &worksize); //WINDOW = 2048 

//host memory, allocate
float *h_wave;
cufftComplex *h_complex_waveSpec;
unsigned int m_num_real_elems = batch*WINDOW*2;
h_wave = (float*)malloc(m_num_real_elems * sizeof(float));
h_complex_waveSpec = (cufftComplex*)malloc((m_num_real_elems/2+1)*sizeof(cufftComplex));

//init
memset(h_wave, 0, sizeof(float) * m_num_real_elems); //last window won't probably be full of file data, so fill memory with 0
memset(h_complex_waveSpec, 0, sizeof(cufftComplex) * (m_num_real_elems/2+1));
targetFile->getNoiseFile(h_wave); //fill h_wave with samples from sound file

//device memory, allocate, copy from host
float *d_wave;
cufftComplex *d_complex_waveSpec;

cudaMalloc((void**)&d_wave, m_num_real_elems * sizeof(float));
cudaMalloc((void**)&d_complex_waveSpec, (m_num_real_elems/2+1) * sizeof(cufftComplex));

cudaMemcpy(d_wave, h_wave, m_num_real_elems * sizeof(float), cudaMemcpyHostToDevice);

//prepare callback
cufftCallbackStoreC hostNoiseStampCallbackPtr;

cudaMemcpyFromSymbol(&hostNoiseStampCallbackPtr,
                          noiseStampCallbackPtr,
                          sizeof(hostNoiseStampCallbackPtr));

cufftResult status = cufftXtSetCallback(forwardFFTPlan,
                                        (void **)&hostNoiseStampCallbackPtr,
                                        CUFFT_CB_ST_COMPLEX,
                                        NULL);
//always return status 14 - CUFFT_NOT_IMPLEMENTED

//run forward plan
cufftResult result = cufftExecR2C(forwardFFTPlan, d_wave, d_complex_waveSpec);
//result seems to be okay without cufftXtSetCallback

我知道我只是 CUDA 的初学者。我的问题是:
如何正确调用 cufftXtSetCallback 或导致此错误的原因是什么?

4

2 回答 2

4

参考文档

回调 API 仅在静态链接的 cuFFT 库中可用,并且仅在 64 位 LINUX 操作系统上可用。使用此 API 需要当前的许可证。注册开发人员可在 2015 年 6 月 30 日之前免费获得评估许可证。要了解更多信息,请访问cuFFT 开发者页面

我认为您遇到了未实现的错误,因为您不是在 Linux 64 位平台上,或者您没有明确链接到 CUFFT 静态库。cufft 回调示例中的 Makefile将提供正确的链接方法。

即使您解决了该问题,CUFFT_LICENSE_ERROR除非您获得了其中一个评估许可证,否则您可能会遇到问题。

请注意,链接到 cufft 静态库也有各种设备限制。应该可以构建一个静态链接的 CUFFT 应用程序,该应用程序将在 cc 2.0 和更高版本的设备上运行。

于 2014-09-13T15:37:46.090 回答
1

A new (2019) possibility are cuFFT device extensions (cuFFTDX). Being part of the Math Library Early Access they are device FFT functions, which can be inlined into user kernels.

Announcement of cuFFTDX:

https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9240-cuda-new-features-and-beyond.pdf

Math Library Early Access:

https://developer.nvidia.com/cuda-math-library-early-access-program-page

Example Code:

https://github.com/mnicely/cufft_examples

于 2020-09-17T10:08:58.280 回答