python - 如何使用 Python 解析 YouTube XML？

Question

我正在尝试解析嵌入在下面代码中的 YouTube 中的 xml。我正在尝试显示所有标题。但是，当我尝试打印“标题”时，我遇到了麻烦，只出现输入行。有什么建议吗？

#import library to do http requests:
import urllib2

#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations

#download the file:
file = urllib2.urlopen('http://gdata.youtube.com/feeds/api/users/buzzfeed/uploads?v=2&max-results=50')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()

#parse the xml you downloaded
dom = parseString(data)
entry=dom.getElementsByTagName('entry')
for node in entry:
    video_title=node.getAttribute('title')
    print video_title

score 1 · Accepted Answer

标题不是属性，它是条目的子元素。

这是一个如何提取它的示例：

for node in entry:
    video_title = node.getElementsByTagName('title')[0].firstChild.nodeValue
    print video_title

score 0 · Accepted Answer

lxml 可能有点难以理解，所以这里有一个非常简单的美汤解决方案（它被称为beautifulsoup 是有原因的）。也可以设置美汤使用lxml解析器，速度差不多。

from bs4 import BeautifulSoup
soup = BeautifulSoup(data) # data as is seen in your code
soup.findAll('title')

返回一个title元素列表。在这种情况下，您也可以soup.findAll('media:title')只返回media:title元素（实际的视频名称）。

score 0 · Accepted Answer

您的代码中有一个小错误。您可以将title作为属性访问，尽管它是entry的子元素。您的代码可以通过以下方式修复：

dom = parseString(data)
for node in dom.getElementsByTagName('entry'):
    print node.getElementsByTagName('title')[0].firstChild.data

python - 如何使用 Python 解析 YouTube XML？

3 回答 3

Related

Reference