python - 彼得·派珀（Peter Piper）通过管道传输了一个 Python 程序 - 并且丢失了他所有的 unicode 字符

Question

score 8 · Accepted Answer

From your comments and question update it seems that the data is correctly encoded in UTF-8. This means you just need to tell your browser it's UTF-8, either by using a BOM, or better, by adding encoding information to your HTML document:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>

You really shouldn't use an XML declaration if the document is no valid XML.

The best and most reliable way would be to serve the file via HTTP and set the Content-Type: header appropriately.

score 5 · Accepted Answer

When you pipe a Python program to an output file on Windows, does it always use this character set?

Default encoding used to output to pipe. On my machine:

In [5]: sys.getdefaultencoding()
Out[5]: 'ascii'

If not, is there a workaround?

import sys
try:
    sys.setappdefaultencoding('utf-8')
except:
    sys = reload(sys)
    sys.setdefaultencoding('utf-8')

Now all output is encoded to 'utf-8'.

I think correct way to handle this situation without

redo a whole bunch of logic

is to decode all data from your internet source from server or page encoding to unicode, and then to use workaround shown above to set default encoding to utf-8.

score 2 · Accepted Answer

Most programs under Windows will assume that you're using the default Windows encoding, which will be ISO-8859-1 for an English installation. This goes for the command window output as well. There's no way to set the default encoding to UTF-8 unfortunately - there's a code page defined for it, but it's not well supported.

Some editors will recognize any BOM characters at the start of the file and switch to UTF-8, but that's not guaranteed.

If you're generating HTML you should include the proper charset tag; then the browser will interpret it properly.

python - 彼得·派珀（Peter Piper）通过管道传输了一个 Python 程序 - 并且丢失了他所有的 unicode 字符

3 回答 3

Related

Reference