0

出于某种奇怪的原因,在我从 ubuntu 12 切换到 ubuntu 14 后,我的 python 代码停止工作。我不能再解开我的数据了。我通过转换为 latin1 编码将数据存储在 couchdb 数据库中。

我使用 latin1 是因为我前段时间读到(我不再有链接)它是我可以用来从 couchdb 数据库存储和检索 cPickled 二进制数据的唯一编码。这是为了避免 json 的编码问题(couchdbkit 在后台使用 json)。

Latin1 应该将 256 个字符映射到 256 个字符,这将是一个字节一个字节。现在,系统升级后,python 似乎抱怨只有 128 个有效值并抛出 UnicodeDecodeError (见下文)

  • 旧的 python 版本是 2.7.3
  • 旧的 couchdb 版本 1.6.1
  • 旧的 couchdbkit 是 0.5.7

  • 新的python版本是2.7.6

  • 新的 couchdb 版本 1.6.1(未更改)
  • 新的 couchdbkit 是 0.6.5

不确定您是否需要所有这些详细信息,但这里有一些我使用的声明:

#deals with all the errors when saving an item
def saveitem(item):  
    item.set_db(self.db)
    item["_id"] = key  
    error = True
    while error:
        try:    
            item.save()
            error = False
        except ResourceConflict:
            try:
                item = DBEntry.get_or_create(key)
            except ResourceConflict:
                pass
        except (NoMoreData) as e:
            print "CouchDB.set.saveitem: NoMoreData error, retrying...", str(e)
        except (RequestError) as e:
            print "CouchDB.set.saveitem: RequestError error. retrying...", str(e)

#deals with most of what could go wrong when adding an attachment
def addattachment(item, content, name = "theattachment"):
    key = item["_id"]
    error = True
    while error:
        try:
            item.put_attachment(content = content, name = name) #, content_type = "application/octet-stream"
            error = False
        except ResourceConflict:
            try:
                item = DBEntry.get_or_create(key)
            except ResourceConflict:
                print "addattachment ResourceConflict, retrying..."
            except NoMoreData:
                print "addattachment NoMoreData, retrying..."

        except (NoMoreData) as e:
            print key, ": no more data exception, wating 1 sec and retrying... -> ", str(e)
            time.sleep(1)
            item = DBEntry.get_or_create(key)
        except (IOError) as e:
            print "addattachment IOError:", str(e), "repeating..." 
            item = DBEntry.get_or_create(key)
        except (KeyError) as e:
            print "addattachment error:", str(e), "repeating..." 
            try:
                item = DBEntry.get_or_create(key)
            except ResourceConflict:
                pass
            except (NoMoreData) as e:
                pass

然后我保存如下:

        pickled = cPickle.dumps(obj = value, protocol = 2)
        pickled = pickled.decode('latin1')
        item = DBEntry(content={"seeattachment": True, "ispickled" : True},
            creationtm=datetime.datetime.utcnow(),lastaccesstm=datetime.datetime.utcnow())
        item = saveitem(item)
        addattachment(item, pickled)

这就是我打开包装的方式。数据是在ubuntu 12下写的,在ubuntu 14下解包失败:

def unpackValue(self, value, therawkey):
    if value is None: return None
    originalval = value
    value = value["content"]
    result = None
    if value.has_key("realcontent"):
        result = value["realcontent"]
    elif value.has_key("seeattachment"):
        if originalval.has_key("_attachments"):
            if originalval["_attachments"].has_key("theattachment"):
                if originalval["_attachments"]["theattachment"].has_key("data"):
                    result = originalval["_attachments"]["theattachment"]["data"]
                    result = base64.b64decode(result)
                else:
                    print "unpackvalue: no data in attachment. Here is how it looks like:"
                    print originalval["_attachments"]["theattachment"].iteritems()
        else:
            error = True
            while error:
                try:
                    result = self.db.fetch_attachment(therawkey, "theattachment")
                    error = False
                except ResourceConflict:
                    print "could not get attachment for", therawkey, "retrying..."
                    time.sleep(1)
                except ResourceNotFound:
                    self.delete(key = therawkey, rawkey = True)
                    return None

        if value["ispickled"]:
            result = cPickle.loads(result.encode('latin1'))
    else:
        result = value

    if isinstance(result, unicode): result = result.encode("utf8")
    return result

该行在result = cPickle.loads(result.encode('latin1'))ubuntu 12 下成功,但在 ubuntu 14 下失败。以下错误:

UnicodeDecodeError:“ascii”编解码器无法解码位置 0 的字节 0xc2:序数不在范围内(128)

我在 ubuntu 12 下没有得到那个错误!

如何在保留较新的 couchdbkit 和 python 版本的同时在 ubuntu 14 下读取我的数据?这甚至是版本控制问题吗?为什么会发生这种错误?

4

1 回答 1

1

似乎有一些变化——可能在 couchdbkit 的 API 中——它使resultUTF-8 编码str,而之前是unicode.

由于您要对unicodein进行编码latin1,因此解决方法是使用

cPickle.loads(result.decode('utf8').encode('latin1'))

请注意,最好找到在哪里result进行 UTF-8 编码并防止这种情况发生(所以你仍然unicode像在 Ubuntu 12 下所做的那样)或者将编码更改为latin1result想要的形式。

于 2014-11-30T14:30:14.023 回答