python - Python CouchDB 无法保存从 feedparser 条目创建的字典？（没有属性“读取”）

Question

我有一个脚本，我想读取 RSS 提要中的条目并将单个条目以 JSON 格式存储到 CouchDB 数据库中。

我的代码中有趣的部分如下所示：

Feed = namedtuple('Feed', ['name', 'url'])

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(item)

当我尝试运行该代码时，我收到以下错误db.save(item)：

AttributeError: object has no attribute 'read'

好的，所以我然后做了一些调试......

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        print(type(item))

结果<class 'feedparser.FeedParserDict'>- 啊，所以 feedparser 正在使用它自己的 dict 类型......好吧，如果我尝试将它显式转换为 adict呢？

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(dict(item))

Traceback (most recent call last):
  File "./feedchomper.py", line 32, in <module>
    db.save(dict(item))
  File "/home/dealpref/lib/python2.7/couchdb/client.py", line 407, in save
_, _, data = func(body=doc, **options)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 399, in post_json
status, headers, data = self.post(*a, **k)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 381, in post
**params)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 419, in _request
credentials=self.credentials)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 239, in request
    resp = _try_request_with_retries(iter(self.retry_delays))
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 196, in _try_request_with_retries
    return _try_request()
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 222, in _try_request
    chunk = body.read(CHUNK_SIZE)
AttributeError: 'dict' object has no attribute 'read'

w-什么？这没有意义，因为以下工作正常并且类型仍然是dict：

some_dict = dict({'foo': 'bar'})
print(type(some_dict))
db.save(some_dict)

我在这里想念什么？

score 4 · Accepted Answer

在邮件列表上回答，但基本上这是因为 feedbparser 条目包含无法无损序列化为 JSON 的数据，例如 time.struct_time 实例。不幸的是，couchdb-python 然后继续假设它是一个文件，掩盖了实际的错误。

score 4 · Accepted Answer

我找到了一种方法，将结构序列化为 JSON，然后返回到我传递给 CouchDB 的 Python dict——然后将其重新序列化回 JSON 以保存（是的，奇怪且不受欢迎，但它有效吗？）

我不得不为转储做一个自定义的序列化方法，因为repratime_struct不能是eval'd。

来源：http ://diveintopython3.org/serializing.html

代码：

#!/usr/bin/env python2.7

from collections import namedtuple
import csv
import json
import time

import feedparser
import couchdb

def to_json(python_object):
    if isinstance(python_object, time.struct_time):
        return {'__class__': 'time.asctime',
                '__value__': time.asctime(python_object)}

    raise TypeError(repr(python_object) + ' is not JSON serializable')

Feed = namedtuple('Feed', ['name', 'url'])

COUCH_HOST = 'http://mycouch.com'
COUCH_USER = 'user'
COUCH_PASS = 'pass'

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        j = json.dumps(item, default=to_json)
        db.save(json.loads(j))

score 1 · Accepted Answer

也许Python CouchDB 中有一个错误。你可以说它接受的东西不够自由。

但是，基本上，CouchDB 存储 JSON。您应该使用您的语言中的任何“JSON”。显然，对于 Python，这意味着dict对象。

在调用 CouchDB 之前，弄清楚如何将所有类型转换为普通的 Python dict，您可能会得到最好的实惠。也许这不是最“正确”的解决方案，但我怀疑它是最快的。

我的 Python 生锈了。有没有可能dict(foo)返回一个非字典？也许FeedParserDict子类，然后在调用dict时使用元编程返回自己？dict()你能确认这type(dict(item))绝对是一个普通的 Python 字典吗？

Javascript 领域的一个常见技巧是通过 JSON 等序列化程序进行往返。类似的东西pickle.loads(pickle.dumps(item))。这几乎可以保证您拥有核心数据的纯副本。

python - Python CouchDB 无法保存从 feedparser 条目创建的字典？（没有属性“读取”）

3 回答 3

Related

Reference