python - UTF-8 和 ASCII 编码的混合使用？

Question

我目前正在使用 Python 2.7 解析大型文本文件，其中一些最初是用 Unicode 或 UTF-8 编码的。

对于包含直接与 UTF-8 中的字符串交互的函数的模块，我将其包含# -*- coding: utf-8 -*-在文件的顶部，但对于仅使用 ascii 的函数，我没有打扰。

最终，这些模块导致更大的模块，并且所有解析的字符串混合在一起。# -*- coding: utf-8 -*-包含在每个文件的顶部是一种好习惯吗？

这有什么好处吗？

score 8 · Accepted Answer

# -*- coding: utf-8 -*- declares the encoding of the source file only. It has nothing to do whatsoever with the way Python handles input or output. It just means you can write string literals and comments using UTF-8.

Here's the effect of a coding declaration. Let's say I have a program

# -*- coding: utf-8 -*-
# the following prints the Dutch word "één"
print(u"\xe9\xe9n")

This does exactly what the comment says. But if I remove the coding declaration, it crashes:

File "a.py", line 1
SyntaxError: Non-ASCII character '\xc3' in file a.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Note that line 1 is the comment. The program can be fixed by removing the comment, leaving just

print(u"\xe9\xe9n")

which still behaves exactly the same as the first program.

score 1 · Accepted Answer

每个 ASCII 文件也是一个有效的 UTF-8。不必担心将 ASCII 文件视为 UTF-8 文件，无需转换，无需增加大小。

python - UTF-8 和 ASCII 编码的混合使用？

2 回答 2

Related

Reference