python - string.decode() 与 unicode(string)

Question

myString = 'éíěřáé'

我需要将此字符串解码为 unicode。以下用法之间以及这两种方法之间有什么区别吗？

myString.decode(encoding='UTF-8', errors='ignore')

和

unicode(myString, encoding='UTF-8', errors='ignore')

score 10 · Accepted Answer

unicode构造函数可以采用除字符串之外的其他类型：

>>> unicode(10)
u'10'

然而，对于字节串的情况，这两种形式大多是等价的。一些编码选项对构造函数无效，unicode因为它们不会产生 unicode 输出，但.decode对字节串的方法有效，例如'hex'：

>>> unicode('10', encoding='hex')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoder did not return an unicode object (type=str)

score 2 · Accepted Answer

它们本质上是相同的，但在任何一种情况下都有一些小的性能捷径；str.decode知道它的参数是一个字符串，所以它可以快捷地检查它的参数，同时unicode.__new__也有一些常见编码的快捷方式，包括 UTF-8。

PyCodec_Decode在一般情况下，这两种方法都会调用。

score 0 · Accepted Answer

在 Python 2.xstr.decode()中，可能会产生一个 unicode 对象或另一个str. 该unicode()函数仅适用于产生 unicode 对象的编码。

例如：

>>> "x\x9cKLJ\x06\x00\x02M\x01'".decode('zip')
'abc'
>>> unicode("x\x9cKLJ\x06\x00\x02M\x01'", encoding='zip')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoder did not return an unicode object (type=str)
>>>

请注意，在内部，它们的工作方式与调用相同，unicode()表明它确实解码了对象，然后才反对结果的类型。

python - string.decode() 与 unicode(string)

3 回答 3

Related

Reference