1

I tried to run RenderScript on two phones, one with 2GHz Intel Atom Z2580 CPU-dual core, one with 2.2GHz Qualcomm Snapdragon 800 CPU-quad core. While RenderScript did enable the program to run in parallel on both devices, I found the absolute performance of the same program are very different on both devices. I did the following experiment:

  1. Use "setprop debug.rs.max-threads" under adb shell to limit the max running thread to 1 (only use 1 core)
  2. Use both RenderScript's ScriptIntrinsicYuvToRGB and ScriptIntrinsicBlur, the two built-in intrinsics to process the raw YUV camera data and output a blurred image.

While the intrinsics are supposed to be highly optimized in the byte code level, I found Qualcomm Snapdragon 800 performed almost 5 to 6 times faster than Intel Atom Z2580 (both used only 1 core in the experiment). I'm not sure why this is the case. My guess is as follows:

I did another test. I used ARM compiler to compile some simple NDK based c code to machine code and found it also runs on the Intel CPU powered device. However, if I use Intel compiler to compile the same code, I found a significant speed up (3x to 4x) on the same device. Since I don't know what the libbcc compiler on the device does to the renderscript byte code on my Intel cpu based device, the bad performance could be caused by the wrong (or inappropriate) compilation target in the run time?

If this is true, is there any way to choose the run-time compiler for Intel x86 cpu based device for RenderScript?

4

1 回答 1

2

您实际上只是看到 ARM 内在函数比 x86 内在函数优化得多。如果您看到 x86 平台使用为 ARM(尤其是 NEON)生成的 RS 代码进行二进制翻译,我认为您会看到超过 3-4 倍的减速。

于 2013-09-30T16:38:01.793 回答