4
4

3 回答 3

6
  1. Tell Perl how to encode the output.

    use open ':std', ':encoding(UTF-8)';
    
  2. Get the data from the database as text by using

    DBI->connect("DBI:mysql:database=my_db", $user, $pass, {
       mysql_enable_utf8 => 1,
    })
    
于 2015-01-13T16:28:05.373 回答
3

You probably need to tell DBI to use UTF8 when talking to the database.

$dbh=DBI->connect(
   'DBI:mysql:database=my_db', $user, $pass,
   { mysql_enable_utf8 => 1 }
);
于 2015-01-13T16:26:31.613 回答
2

Q: Why is Perl doing this? Can I override it?

It's not being escaped. That's a symptom of a characterset translation issue. The question mark character is a default character used when a code point doesn't map to any other character in the target characterset.


The short answer, as to why Perl is doing this may be: by default, Perl outputs to STDOUT using ascii characterset. Since ASCII only supports code points up to U+00EF, all other code points (for example, characters 128 thru 255) get translated to a question mark character.

The short answer as to how to override this behavior may be: specify that STDIN, STDOUT and STDERR use utf8 encoding rather than ascii by including a line like this in your perl program:

use open qw(:std :utf8);

Another potential issue is the setting of the MySQL session character_set_client variable; the database connection may be using a latin1 characterset, but the database/server/column characterset may be utf8, so a characterset translation may also be occurring there.

And it's possible to specify the characterset to be used in the database connection, to avoid an unwanted characterset translation.


As a starting point of understanding charactersets, here's two references you should have under your belt:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

于 2015-01-13T17:16:34.563 回答