1

If you get a unicode error, it is sometimes hard to find the root of the problem. Where does this string come from?

Is there a way to show the string (or part of buggy string)?

4

1 回答 1

4

You can use this snippet:

try:
    html = html.decode(encoding)
except UnicodeError as exc:
    re_raise_unicode_error_with_hint(exc)

def re_raise_unicode_error_with_hint(exc):
    hint = exc.object[max(exc.start - 15, 0):min(exc.end + 15, len(exc.object))]
    raise exc.__class__(exc.encoding, exc.object, exc.start, exc.end, 'hint: %r' % hint)

This way you see 15 chars before and 15 chars after the unicode error of your string.

于 2013-09-16T08:19:52.337 回答