python - 在 xml 文档中挑出标签？

Question

我有一个我认为是相当简单的问题。

我从 gdata 中检索了一个文件，这个文件：https ://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments

我试图挑出之间的文本

"< author >HERE< /author >"

标签，所以我会留下一个只包含用户名的输出。python甚至是解决这个问题的最好方法还是我应该使用另一种语言？从早上 8:00（4 小时）开始，我就一直在谷歌上搜索，但我还没有找到任何东西来完成这样一个看似简单的任务。

最好的问候， - 米奇鲍威尔

score 1 · Accepted Answer

你在那里有一个原子提要，所以我会用它feedparser来处理它：

import feedparser

result = feedparser.parse('https://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments')
for entry in result.entries:
    print entry.author

这打印：

FreebieFM
micromicros
FreebieFM
Sarah Grimstone
FreebieFM
# etc.

Feedparser 是一个外部库，但易于安装。如果您只能使用标准库，则可以使用ElementTreeAPI，但要解析 Atom 提要，您需要在解析器中包含 HTML 实体，并且您必须处理命名空间（不是ElementTree的强项）：

from urllib2 import urlopen
from xml.etree import ElementTree

response = urlopen('https://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments')
tree = ElementTree.parse(response)

nsmap = {'a': 'http://www.w3.org/2005/Atom'}
for author in tree.findall('.//a:author/a:name', namespaces=nsmap):
    print author.text

nsmap字典允许将前缀ElementTree转换a:为这些元素的正确命名空间。

python - 在 xml 文档中挑出标签？

1 回答 1

Related

Reference