python - 关于通用提要解析器的问题

Question

我在从我抓取的几个博客提要中获取内容时遇到了问题。

I'm uncertain what is the reason, but by parsing one or two blogs with the feedparser returns me this particular error:

results = feedparser.parse(url)

  ent = []

  for entry in results.entries:
     e = {}
     e['title'] = entry.title
     e['content'] = entry.content[0].value

object has no attribute 'content'

or

object has no attribute 'link'

This hasn't been the case for the rest of my other blogs. Does empty entry content results in this?

score 1 · Accepted Answer

提要中使用的 XML 标记与提要解析器中条目上可用的属性之间存在映射。查看导致问题的源之一的来源，并查看它使用了哪些标签。您可能会发现它不包含条目的内容，或者链接位于类似uid而不是link.

然后，您需要编写代码来处理细微的变化，方法是使用try/catch或使用hasattr.

如果您发布指向有问题的提要之一的链接，我也许可以提供更多建议。

python - 关于通用提要解析器的问题

1 回答 1

Related

Reference