16
4

3 回答 3

8

From your comments and question update it seems that the data is correctly encoded in UTF-8. This means you just need to tell your browser it's UTF-8, either by using a BOM, or better, by adding encoding information to your HTML document:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>

You really shouldn't use an XML declaration if the document is no valid XML.

The best and most reliable way would be to serve the file via HTTP and set the Content-Type: header appropriately.

于 2012-01-06T17:15:31.847 回答
5

When you pipe a Python program to an output file on Windows, does it always use this character set?

Default encoding used to output to pipe. On my machine:

In [5]: sys.getdefaultencoding()
Out[5]: 'ascii'

If not, is there a workaround?

import sys
try:
    sys.setappdefaultencoding('utf-8')
except:
    sys = reload(sys)
    sys.setdefaultencoding('utf-8')

Now all output is encoded to 'utf-8'.

I think correct way to handle this situation without

redo a whole bunch of logic

is to decode all data from your internet source from server or page encoding to unicode, and then to use workaround shown above to set default encoding to utf-8.

于 2012-01-06T17:01:33.520 回答
2

Most programs under Windows will assume that you're using the default Windows encoding, which will be ISO-8859-1 for an English installation. This goes for the command window output as well. There's no way to set the default encoding to UTF-8 unfortunately - there's a code page defined for it, but it's not well supported.

Some editors will recognize any BOM characters at the start of the file and switch to UTF-8, but that's not guaranteed.

If you're generating HTML you should include the proper charset tag; then the browser will interpret it properly.

于 2012-01-06T17:10:53.420 回答