0

我知道这个问题与至少十几个其他问题完全相同。但是我只有在确信这些问题很少能解决我的问题后才无奈地发布了这个问题。

基本上我想从包含各种语言字符的网站中获取内容并将它们插入数据存储区。但无论我尝试了什么,错误似乎都没有改变。

我的示例代码:

class URLEntry(db.Model):
    content = db.TextProperty()

class ViewURL(webapp2.RequestHandler):  
    def get(self):      
            import urllib2
            url = "http://iitk.ac.in/"
            try:
                result = urllib2.urlopen(url)
            except urllib2.URLError, e:
                handleError(e)
            content = result.read()
            e = URLEntry(key_name=url,content=content)
            URLEntry.get_or_insert(url,content=content) #Probably this line generates the error.

抛出错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)

追溯:

'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)
    Traceback (most recent call last):
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
        rv = self.handle_exception(request, response, e)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
        rv = self.router.dispatch(request, response)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
        return route.handler_adapter(request, response)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
        return handler.dispatch()
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
        return self.handle_exception(e, self.app.debug)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
        return method(*args, **kwargs)
      File "/base/data/home/apps/s~govt-jobs/1.368125505627581007/checkforurls.py", line 83, in get
        URLEntry.get_or_insert(url,content=result.content)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1362, in get_or_insert
        return run_in_transaction(txn)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2461, in RunInTransaction
        return RunInTransactionOptions(None, function, *args, **kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2599, in RunInTransactionOptions
        ok, result = _DoOneTry(new_connection, function, args, kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2621, in _DoOneTry
        result = function(*args, **kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1359, in txn
        entity = cls(key_name=key_name, **kwds)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 970, in __init__
        prop.__set__(self, value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 614, in __set__
        value = self.validate(value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2798, in validate
        value = self.data_type(value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore_types.py", line 1163, in __new__
        return super(Text, cls).__new__(cls, arg, encoding)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)

同样正如其他 StackOverflow 答案所建议的那样,我在尝试插入数据存储之前尝试添加以下内容:

content = content.decode("ISO-8859-1") # The encoding of the page is ISO-8859-1
content = content.encode("utf-8")

但错误占上风。请帮忙。

4

1 回答 1

1

如果你说解码,它会在你提供的编码中翻译二进制。如果您使用编码,则相反。

content = content.encode("utf-8") # translates utf-8 in binary

数据存储使用 utf-8。

看看尼克约翰逊的这篇很棒的博客文章:http: //blog.notdot.net/2010/07/Getting-unicode-right-in-Python

于 2013-06-16T14:14:03.690 回答