python - Python difference between print obj and print obj.str() [at least with Unicode?]

Question

I was given to understand that calling print obj would call obj.__str__() which would in turn return a string to print to the console. Now I head a problem with Unicode where I could not print any non-ascii characters. I got the typical "ascii out of range" stuff.

While experimenting the following worked:

print obj.__str__()
print obj.__repr__()

With both functions doing exactly the same (__str__() just returns self.__repr__()). What did not work:

print obj

The problem occured only with using a character out of ascii range. The final solution was to to the following in __str__():

return self.__repr__().encode(sys.stdout.encoding)

Now it works for all parts. My question now is: Where is the difference? Why does it work now? I get if nothing worked, why this works now. But why does only the top part work, not the bottom.

OS is Windows 7 x64 with a default Windows command prompt. Also the encoding is reported to be cp850. This is more of a general question to understand python. My problem is already solved, but I am not 100% happy, mostly because now calling str(obj) will yield a string that is not encoded in the way I wanted it.

# -*- coding: utf-8 -*- 
class Sample(object):

    def __init__(self):
        self.name = u"üé"

    def __repr__(self):
        return self.name

    def __str__(self):
        return self.name

obj = Sample()
print obj.__str__(), obj.__repr__(), obj

Remove the last obj and it works. Keep it and it crashes with

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

score 4 · Accepted Answer

我的猜测是 print 对obj要打印的对象执行以下操作：

检查是否obj为unicode. 如果是这样，将其编码sys.stdout.encoding并打印。
检查是否obj为str. 如果有，直接打印。
如果obj是其他任何东西，则调用str(obj)并打印它。

第 1 步是为什么print obj.__str__()适用于您的情况。

现在，str(obj)做的是：

打电话obj.__str__()。
如果结果是 a str，则返回它
如果结果是 a unicode，则将其编码"ascii"并返回
否则，东西大多没用。

直接调用obj.__str__()会跳过第 2-3 步，这就是为什么不会出现编码失败的原因。

问题不是由print工作方式引起的，而是由str()工作方式引起的。str()忽略sys.stdout.encoding。因为它不知道你想对结果字符串做什么，所以它使用的默认编码可以被认为是任意的；ascii与任何选择一样好或坏。

为防止出现此错误，请确保按照文档中的说明返回strfrom 。__str__()您可以用于 Python 2.x 的模式可能是：

class Foo():
    def __unicode__(self):
        return u'whatever'
    def __str__(self):
        return unicode(self).encode(sys.stdout.encoding)

（如果您确定除了打印到控制台之外不需要str()任何表示。）

score 1 · Accepted Answer

首先，如果您查看在线文档，__str__并且__repr__有不同的目的并且应该创建不同的输出。所以调用__repr__from__str__不是最好的解决方案。

其次，print会调用__str__并且不会期望接收非ASCII字符，因为，好吧，print无法猜测如何转换非ASCII字符。

最后，在 Python 2.x 的最新版本中，__unicode__它是为对象创建字符串表示的首选方法。Python str与unicode中有一个有趣的解释。

因此，要尝试真正回答问题，您可以执行以下操作：

class Sample(object):

    def __init__(self):
        self.name = u"\xfc\xe9"

    # No need to implement __repr__. Let Python create the object repr for you

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.name

python - Python difference between print obj and print obj.__str__() [at least with Unicode?]

2 回答 2

Related

Reference

python - Python difference between print obj and print obj.str() [at least with Unicode?]