I am pressed for time to optimize a large piece of C code for speed and I am looking for an algorithm---at the best a C "snippet"---that transposes a rectangular source matrix u[r][c]
of arbitrary size (r
number of rows, c
number of columns) into a target matrix v[s][d]
(s = c
number of rows, d = r
number of columns) in a "cache-friendly" i. e. data-locality respecting way. The typical size of u
is around 5000 ... 15000 rows by 50 to 500 columns, and it is clear that a row-wise access of elements is very cache-inefficient.
There are many discussions on this topic in the web (nearby this thread), but as far as I see all of them discuss the spacial cases like square matrices, u[r][r]
, or the definition an on-dimensional array, e. g. u[r * c]
, not the above mentioned "array of arrays" (of equal length) used in my context of Numerical Recipes (background see here).
I would by very thankful for any hint that helps to spare me the "reinvention of the wheel".
Martin