python - Python - use lxml to return value of title.text attrib

Question

I'm trying to figure out how to use lxml to parse the xml from a url to return the value of the title attribute. Does anyone know what I have wrong or what would return the Title value/text? So in the example below I want to return the value of 'Weeds - S05E05 - Van Nuys - HD TV'

XML from URL:

<?xml version="1.0" encoding="UTF-8"?>
<subsonic-response xmlns="http://subsonic.org/restapi" status="ok" version="1.8.0">
<song id="11345" parent="11287" title="Weeds - S05E05 - Van Nuys - HD TV" album="Season 5" artist="Weeds" isDir="false" created="2009-07-06T22:21:16" duration="1638" bitRate="384" size="782304110" suffix="mkv" contentType="video/x-matroska" isVideo="true" path="Weeds/Season 5/Weeds - S05E05 - Van Nuys - HD TV.mkv" transcodedSuffix="flv" transcodedContentType="video/x-flv"/>
</subsonic-response>

My current Python code:

import lxml
from lxml import html
from urllib2 import urlopen

url = 'https://myurl.com'

tree = html.parse(urlopen(url))
songs = tree.findall('{*}song')
for song in songs:
    print song.attrib['title']

With the above code I get no data return, any ideas?

print out of tree =

<lxml.etree._ElementTree object at 0x0000000003348F48>

print out of songs =

[]

score 3 · Accepted Answer

首先，您实际上并没有lxml在代码中使用。您导入lxmlHTML 解析器，否则忽略它，而只使用标准库xml.etree.ElementTree模块。

其次，您搜索data/song但data文档中没有任何元素，因此不会找到匹配项。最后但并非最不重要的一点是，您有一个使用名称空间的文档。您必须在搜索元素时包含这些元素，或使用{*}通配符搜索。

以下为您查找歌曲：

from lxml import etree

tree = etree.parse(URL)  # lxml can load URLs for you
songs = tree.findall('{*}song')
for song in songs:
    print song.attrib['title']

要使用显式命名空间，您必须将{*}通配符替换为完整的命名空间 URL；默认命名.nsmap空间在对象的命名空间字典中可用tree：

namespace = tree.nsmap[None]
songs = tree.findall('{%s}song' % namespace)

score 0 · Accepted Answer

感谢您的帮助，我使用了您的两个组合来使其正常工作。

import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'https://myurl.com'
root = ET.parse(urlopen(url)).getroot()
for song in root:
    print song.attrib['title']

score 0 · Accepted Answer

整个问题在于subsonic-response标签具有一个xmlns属性，表明存在一个有效的 xml 命名空间。下面的代码考虑到了这一点，并正确地添加了歌曲标签。

import xml.etree.ElementTree as ET
root = ET.parse('test.xml').getroot()
print root.findall('{http://subsonic.org/restapi}song')

python - Python - use lxml to return value of title.text attrib

3 回答 3

Related

Reference