如果不知道 DSP 和 ARM 的时钟频率,就无法回答这个问题。
这里有一些背景:
我刚刚检查了 c674x DSP 上的浮点乘法周期:
它可以在每个周期发出两次乘法,并且每次乘法的结果延迟为三个周期(这意味着您必须等待三个额外的周期才能将结果显示在目标寄存器中)。
但是,您可以在每个周期开始两次乘法,因为 DSP 不会等待结果。编译器/汇编器将为您执行所需的调度。
这仅使用了 DSP 的八个可用功能单元中的两个,因此当您执行两个乘法时,您可以在每个周期执行以下操作:
- 两个加载/存储(64 位宽)
- 六个浮点加/减指令(或整数指令)
环路控制和分支是免费的,并且不会在 DSP 上花费您任何费用。
That makes a total of six floating point operations per cycle with parallel loads/stores and loop control.
ARM-NEON on the other hand can, in floating point mode:
Issue two multiplications per cycle. Latency is comparable, and the instructions are also pipeline-able like on the DSP. Loading/storing takes extra time as does add/subtract stuff. Loop control and branching will very likely go for free in well written code.
So in summary the DSP does three times as much work per cycle as the Cortex-A9 NEON unit.
Now you can check the clock-rates of DSP and the ARM and see what is faster for your job.
Oh, one thing: With well-written DSP code you will almost never see a cache miss during loads because you move the data from RAM to the cache using DMA before you access the data. This gives impressive speed advantages for the DSP as well.