Here's a sample string, stored in a MySQL database, running on a Linux server: ™</p>
That's the single TM character, which is represented as 0x2122
in UTF-16BE
, or 0xE284A2
in UTF-8
The database table is encoded in utf8-unicode-ci
. I'm running PHP on another Linux server, which uses an internal encoded (as reported by mb_internal_encoding
) of ISO-8859-1
, which uses the same encoding for the character as UTF-8.
When I run a SQL query to get the string, it returns 0x0099
, which is its representation in Windows-1252
.
How would that even happen, and how can I fix it to return in a more sensible codepage?