python - Python“string_escape”与“unicode_escape”

Question

根据文档，内置字符串编码string_escape：

产生 [s] 一个适合作为 Python 源代码中的字符串文字的字符串

...而unicode_escape：

在 Python 源代码中产生适合作为 Unicode 文字的字符串

因此，它们应该具有大致相同的行为。但是，他们似乎以不同的方式对待单引号：

>>> print """before '" \0 after""".encode('string-escape')
before \'" \x00 after
>>> print """before '" \0 after""".encode('unicode-escape')
before '" \x00 after

转义单引号，而string_escapeUnicode 则没有。假设我可以简单地：

>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")

...并获得预期的行为？

编辑：只是为了超级清楚，预期的行为是得到适合作为文字的东西。

score 26 · Accepted Answer

根据我对CPython 2.6.5 源代码中的实现unicode-escape和 unicode的解释，是的；和repr之间的唯一区别是包含换行引号和转义使用的引号。repr(unicode_string)unicode_string.encode('unicode-escape')

它们都由相同的功能驱动，unicodeescape_string. 此函数采用一个参数，其唯一功能是切换添加换行引号和转义该引号。

score 13 · Accepted Answer

在 0 ≤ c < 128 范围内，是的，这'是 CPython 2.6 的唯一区别。

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128))
set(["'"])

在此范围之外，这两种类型不可互换。

>>> '\x80'.encode('string_escape')
'\\x80'
>>> '\x80'.encode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'1'.encode('unicode_escape')
'1'
>>> u'1'.encode('string_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: escape_encode() argument 1 must be str, not unicode

在 Python 3.x 上，string_escape编码不再存在，因为str只能存储 Unicode。

python - Python“string_escape”与“unicode_escape”

2 回答 2

Related

Reference