9

I was given to understand that calling print obj would call obj.__str__() which would in turn return a string to print to the console. Now I head a problem with Unicode where I could not print any non-ascii characters. I got the typical "ascii out of range" stuff.

While experimenting the following worked:

print obj.__str__()
print obj.__repr__()

With both functions doing exactly the same (__str__() just returns self.__repr__()). What did not work:

print obj

The problem occured only with using a character out of ascii range. The final solution was to to the following in __str__():

return self.__repr__().encode(sys.stdout.encoding)

Now it works for all parts. My question now is: Where is the difference? Why does it work now? I get if nothing worked, why this works now. But why does only the top part work, not the bottom.

OS is Windows 7 x64 with a default Windows command prompt. Also the encoding is reported to be cp850. This is more of a general question to understand python. My problem is already solved, but I am not 100% happy, mostly because now calling str(obj) will yield a string that is not encoded in the way I wanted it.

# -*- coding: utf-8 -*- 
class Sample(object):

    def __init__(self):
        self.name = u"üé"

    def __repr__(self):
        return self.name

    def __str__(self):
        return self.name

obj = Sample()
print obj.__str__(), obj.__repr__(), obj

Remove the last obj and it works. Keep it and it crashes with

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
4

2 回答 2

4

我的猜测是 print 对obj要打印的对象执行以下操作:

  1. 检查是否objunicode. 如果是这样,将其编码sys.stdout.encoding并打印。
  2. 检查是否objstr. 如果有,直接打印。
  3. 如果obj是其他任何东西,则调用str(obj)并打印它。

第 1 步是为什么print obj.__str__()适用于您的情况。

现在,str(obj)做的是:

  1. 打电话obj.__str__()
  2. 如果结果是 a str,则返回它
  3. 如果结果是 a unicode,则将其编码"ascii"并返回
  4. 否则,东西大多没用。

直接调用obj.__str__()会跳过第 2-3 步,这就是为什么不会出现编码失败的原因。

问题不是由print工作方式引起的,而是由str()工作方式引起的。str()忽略sys.stdout.encoding。因为它不知道你想对结果字符串做什么,所以它使用的默认编码可以被认为是任意的;ascii与任何选择一样好或坏。

为防止出现此错误,请确保按照文档中的说明返回strfrom 。__str__()您可以用于 Python 2.x 的模式可能是:

class Foo():
    def __unicode__(self):
        return u'whatever'
    def __str__(self):
        return unicode(self).encode(sys.stdout.encoding)

(如果您确定除了打印到控制台之外不需要str()任何表示。)

于 2012-07-03T22:40:07.920 回答
1

首先,如果您查看在线文档__str__并且__repr__有不同的目的并且应该创建不同的输出。所以调用__repr__from__str__不是最好的解决方案。

其次,print会调用__str__并且不会期望接收非ASCII字符,因为,好吧,print无法猜测如何转换非ASCII字符。

最后,在 Python 2.x 的最新版本中,__unicode__它是为对象创建字符串表示的首选方法。Python strunicode中有一个有趣的解释。

因此,要尝试真正回答问题,您可以执行以下操作:

class Sample(object):

    def __init__(self):
        self.name = u"\xfc\xe9"

    # No need to implement __repr__. Let Python create the object repr for you

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.name
于 2012-07-03T21:54:23.217 回答