-1

来自什么字符集é?在 Windows 记事本中,在 ANSI 文本文件中包含此字符可以很好地保存。插入类似的东西,你会得到一个错误。é似乎在 Putty 的 ASCII 终端中工作正常(CP437 和 IBM437 是否相同?)而没有。

我可以看到这是 Unicode,而不是 ASCII。但什么是é?它不会给出我在记事本中使用 Unicode 时遇到的错误,但是SyntaxError: Non-ASCII character '\xc3' in file on line , but no encoding declared;在我添加 Python NLTK 所建议的“魔术注释”之前,Python 抛出了:SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)

我添加了“魔术注释”并且没有收到该错误,但是 os.path.isfile() 说文件名é不存在。具有讽刺意味的是,该字符é位于Marc-André Lemburg错误链接到的 PEP 的作者中。

编辑:如果我打印文件的路径,重音 e 显示为,├⌐但我可以复制并粘贴é到命令提示符中。

EDIT2:见下文

Private    > cat scratch.py   ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private    > python scratch.py
Traceback (most recent call last):
  File "scratch.py", line 3, in <module>
    file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private    >

编辑3:

Private    > PS1="Private    > " ; echo code below ; cat scratch.py ; echo =======  ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-

file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
    folder = "C:/path/folder_one"
elif hostname == "Two":
    folder = "C:/path/folder_two"
else:
    folder = "C:/path/folder_three"

path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")


print path
=======
output below
Traceback (most recent call last):
  File "scratch.py", line 18, in <module>
    path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private    >
4

1 回答 1

0

您需要告诉unicode字符串的编码是什么,在这种情况下它utf-8不是ascii,文件头应该是# -*- coding: utf-8 -*-编码声明

# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")
  1 Help on class unicode in module __builtin__:
  2
  3 class unicode(basestring)
  4  |  unicode(object='') -> unicode object
  5  |  unicode(string[, encoding[, errors]]) -> unicode object
  6  |
  7  |  Create a new Unicode object from the given encoded string.
  8  |  encoding defaults to the current default string encoding.
  9  |  errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

正如我在之前的评论中提到的那样,通过切换到 Python 3,您将省去很多麻烦。Windows 文件系统上的带有 unicode 字符的 Python 2 可能是一场噩梦。

于 2020-05-12T01:23:54.983 回答