python - Python feedparser 不使用 atom/WordPress 命名空间？

Question

我正在尝试使用feedparser（一个优秀的库）来解析 WordPress 导出文件，而 WordPress 版本之间的（轻微）不一致让我非常头疼。

atom:linkWordPress 2.x在 XML 输出 ( without_atom_tags.xml )中不包含标签。解析时，命名空间元素在没有前缀的情况下可用：

>>> feed = feedparser.parse("without_atom_tags.xml")
>>> print feed.entries[0].comment_status
u'open'

来自 WordPress 3.x 的 XML确实包含atom:link标签（with_atom_tags.xml），并且您必须为命名空间元素添加前缀：

>>> feed = feedparser.parse("with_atom_tags.xml")
>>> feed.entries[0].wp_comment_status              # <-- Note wp_ prefix
u'open'
>>> feed.entries[0].comment_status
AttributeError: object has no attribute 'comment_status'

xmlns:atom="http://www.w3.org/2005/Atom"有趣的是，如果我添加到根 RSS 元素（with_atom_tags_and_namespace.xml），则不需要前缀。

我需要在不修改 XML 的情况下解析所有这些不同的格式。feedparser 坏了，还是我做错了？我可以在没有一堆讨厌的条件代码的情况下做到这一点吗？

score 0 · Accepted Answer

您能否将缺少的命名空间（atom/wp）直接添加到 feedparser.py 中支持的命名空间的全局列表中？

python - Python feedparser 不使用 atom/WordPress 命名空间？

1 回答 1

Related

Reference