I tried to run RenderScript on two phones, one with 2GHz Intel Atom Z2580 CPU-dual core, one with 2.2GHz Qualcomm Snapdragon 800 CPU-quad core. While RenderScript did enable the program to run in parallel on both devices, I found the absolute performance of the same program are very different on both devices. I did the following experiment:
- Use "setprop debug.rs.max-threads" under adb shell to limit the max running thread to 1 (only use 1 core)
- Use both RenderScript's ScriptIntrinsicYuvToRGB and ScriptIntrinsicBlur, the two built-in intrinsics to process the raw YUV camera data and output a blurred image.
While the intrinsics are supposed to be highly optimized in the byte code level, I found Qualcomm Snapdragon 800 performed almost 5 to 6 times faster than Intel Atom Z2580 (both used only 1 core in the experiment). I'm not sure why this is the case. My guess is as follows:
I did another test. I used ARM compiler to compile some simple NDK based c code to machine code and found it also runs on the Intel CPU powered device. However, if I use Intel compiler to compile the same code, I found a significant speed up (3x to 4x) on the same device. Since I don't know what the libbcc compiler on the device does to the renderscript byte code on my Intel cpu based device, the bad performance could be caused by the wrong (or inappropriate) compilation target in the run time?
If this is true, is there any way to choose the run-time compiler for Intel x86 cpu based device for RenderScript?