I've been executing some tests on Android in order to verify how good the performance of an algorithm (like FFT) can be improved if it is parallelized. I've implemented the algorithms by using pthread with JNI (FFTW) and Java threads (from JTransforms). Instead of getting a better performance by using threads as expected, I've got better results using serial algorithm. It is unclear to me why I've got those results since I'd executed those tests on multicore devices. It seems that the scheduling algorithm used by Android system is kinda different from the one used by Linux and you're out of luck if you want to use more than one CPU to do multiprocessing on Android.
Example with FFTW: The JNI code is in https://github.com/maxrosan/DspBenchmarking/blob/master/jni/fftw_jni.c and its interface is https://github.com/maxrosan/DspBenchmarking/blob/master/src/br/usp/ime/dspbenchmarking/algorithms/fftw/FFTW.java.
The method called in tests is 'execute'.
Example with pure Java: https://github.com/maxrosan/DspBenchmarking/blob/master/src/br/usp/ime/dspbenchmarking/algorithms/jtransforms/fft/DoubleFFT_1D2TAlgorithm.java
Here the method called is 'perform'.
'execute' and 'perform' are called inside another thread.