cufft - about CUFFT input sizes

Question

It's written that CUFFT library supports algorithms that higly optimized for input sizes can be written in the folowing form: 2^a X 3^b X 5^c X 7^d.

How could they managed to do that?

For as far as I know, FFT must provide best perfomance only for 2^a input size.

score 0 · Accepted Answer

0

这意味着质因数大于 7 的输入大小会变慢。

于 2015-04-09T23:54:10.720 回答

score 0 · Accepted Answer

Cooley-Tukey 算法可以对各种 DFT 长度进行操作，可以表示为 N = N_1*N_2。该算法将长度为 N 的 DFT 递归地表示为长度为 N_2 的 N_1 个较小的 DFT。

正如您所注意到的，最快的通常是基数 2 分解，它将长度为 N 的 DFT 递归地分解为长度为 N/2 的 2 个较小的 DFT，运行时间为 O(NlogN)。

但是，实际性能将取决于硬件和实现。例如，如果我们正在考虑线程扭曲大小为 32 的 cuFFT，那么长度为 32 的倍数的 DFT 将是最佳的（注意：只是一个示例，我不知道存在于cuFFT 罩。）

简短的回答：基础代码基于 Cooley-Tukey radix-n 算法针对高达 7 的任何素数分解进行了优化。

http://mathworld.wolfram.com/FastFourierTransform.html

https://en.wikipedia.org/wiki/Cooley-Tukey_FFT_algorithm

cufft - about CUFFT input sizes

2 回答 2

Related

Reference