python - “é”来自哪个字符集？（Python：带“é”的文件名，如何使用 os.path.exists 、filecmp.cmp、shutil.move？）

Question

来自什么字符集é？在 Windows 记事本中，在 ANSI 文本文件中包含此字符可以很好地保存。插入类似的东西，你会得到一个错误。é似乎在 Putty 的 ASCII 终端中工作正常（CP437 和 IBM437 是否相同？）而没有。

我可以看到这是 Unicode，而不是 ASCII。但什么是é？它不会给出我在记事本中使用 Unicode 时遇到的错误，但是SyntaxError: Non-ASCII character '\xc3' in file on line , but no encoding declared;在我添加 Python NLTK 所建议的“魔术注释”之前，Python 抛出了：SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)。

我添加了“魔术注释”并且没有收到该错误，但是 os.path.isfile() 说文件名é不存在。具有讽刺意味的是，该字符é位于Marc-André Lemburg错误链接到的 PEP 的作者中。

编辑：如果我打印文件的路径，重音 e 显示为，├⌐但我可以复制并粘贴é到命令提示符中。

EDIT2：见下文

Private    > cat scratch.py   ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private    > python scratch.py
Traceback (most recent call last):
  File "scratch.py", line 3, in <module>
    file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private    >

编辑3：

Private    > PS1="Private    > " ; echo code below ; cat scratch.py ; echo =======  ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-

file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
    folder = "C:/path/folder_one"
elif hostname == "Two":
    folder = "C:/path/folder_two"
else:
    folder = "C:/path/folder_three"

path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")


print path
=======
output below
Traceback (most recent call last):
  File "scratch.py", line 18, in <module>
    path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private    >

score 0 · Accepted Answer

您需要告诉unicode字符串的编码是什么，在这种情况下它utf-8不是ascii，文件头应该是# -*- coding: utf-8 -*-，编码声明

# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

  1 Help on class unicode in module __builtin__:
  2
  3 class unicode(basestring)
  4  |  unicode(object='') -> unicode object
  5  |  unicode(string[, encoding[, errors]]) -> unicode object
  6  |
  7  |  Create a new Unicode object from the given encoded string.
  8  |  encoding defaults to the current default string encoding.
  9  |  errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

正如我在之前的评论中提到的那样，通过切换到 Python 3，您将省去很多麻烦。Windows 文件系统上的带有 unicode 字符的 Python 2 可能是一场噩梦。

python - “é”来自哪个字符集？（Python：带“é”的文件名，如何使用 os.path.exists 、filecmp.cmp、shutil.move？）

1 回答 1

Related

Reference