3 回答
Tell Perl how to encode the output.
use open ':std', ':encoding(UTF-8)';
Get the data from the database as text by using
DBI->connect("DBI:mysql:database=my_db", $user, $pass, { mysql_enable_utf8 => 1, })
You probably need to tell DBI to use UTF8 when talking to the database.
$dbh=DBI->connect(
'DBI:mysql:database=my_db', $user, $pass,
{ mysql_enable_utf8 => 1 }
);
Q: Why is Perl doing this? Can I override it?
It's not being escaped. That's a symptom of a characterset translation issue. The question mark character is a default character used when a code point doesn't map to any other character in the target characterset.
The short answer, as to why Perl is doing this may be: by default, Perl outputs to STDOUT using ascii characterset. Since ASCII only supports code points up to U+00EF, all other code points (for example, characters 128 thru 255) get translated to a question mark character.
The short answer as to how to override this behavior may be: specify that STDIN, STDOUT and STDERR use utf8 encoding rather than ascii by including a line like this in your perl program:
use open qw(:std :utf8);
Another potential issue is the setting of the MySQL session character_set_client
variable; the database connection may be using a latin1
characterset, but the database/server/column characterset may be utf8
, so a characterset translation may also be occurring there.
And it's possible to specify the characterset to be used in the database connection, to avoid an unwanted characterset translation.
As a starting point of understanding charactersets, here's two references you should have under your belt: