The header in mingw64 said the fast types are "Not actually guaranteed to be fastest for all purposes"
/* 7.18.1.3 Fastest minimum-width integer types
* Not actually guaranteed to be fastest for all purposes <---------------------
* Here we use the exact-width types for 8 and 16-bit ints.
*/
typedef signed char int_fast8_t;
typedef unsigned char uint_fast8_t;
typedef short int_fast16_t;
typedef unsigned short uint_fast16_t;
typedef int int_fast32_t;
typedef unsigned int uint_fast32_t;
__MINGW_EXTENSION typedef long long int_fast64_t;
__MINGW_EXTENSION typedef unsigned long long uint_fast64_t;
and that still applies to ARM or other architectures, because using a narrow type requires zero extension or sign extension in many situations which is less optimal than a native int.
However that'll benefit in large arrays or in case or slow operations (like division). I'm not sure how slow ARM divisions are but on x86 64-bit division is much slower than 32-bit or 8-bit division