What I need to do is to rotate 2-d array by 90 degrees (3x3 on 3 processors, 4x4 on 4 etc. ) using derived datatypes in MPI. I found that using Alltoall function in C on array: 
[ 1][ 2][ 3][ 4]
[ 5][ 6][ 7][ 8]
[ 9][10][11][12]
[13][14][15][16]
I'll get data distributed like this:
1:[ 1][ 5][ 9][13]
2:[ 2][ 6][10][14]
3:[ 3][ 7][11][15]
4:[ 4][ 8][12][16]
What should I do next (what steps should I take) to collect this vectors as an array on one processor (root) in proper order (order that reflects 90 degrees rotation)?
Thanks in advance.