0

我试图从多个相同的标签中获取标签值,除了一个我想忽略的特定标签。这是xml:

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://apple.com/itunes/importer" version="film5.1">
    <provider>studiocanal</provider>
    <language>en-GB</language>
    <video>
        <crew>
            <crew_member billing="top">
                <display_name>John Doe</display_name>
                <roles>
                    <role>Director</role>
                    <role>Screenwriter</role>
                </roles>
            </crew_member>
            <crew_member billing="ordered">
                <display_name>Harry Smith</display_name>
                <roles>
                    <role>Screenwriter</role>
                </roles>
            </crew_member>
            <crew_member billing="ordered">
                <display_name>Jane Doe</display_name>
                <roles>
                    <role>Screenwriter</role>
                </roles>
            </crew_member>
            <crew_member billing="ordered">
                <display_name>Mr. Kimbley</display_name>
                <roles>
                    <role>Producer</role>
                </roles>
            </crew_member>
        </crew>
    </video>
</package>

我需要获取忽略第二个的值并将其放入列表中,我只想从 John Doe 部分获取第一个值。我似乎无法让它工作。这是我当前的代码,它创建并填充列表,但我有 5 个角色,但只需要 4 个,跳过第 2 个。这是我当前的代码:

from lxml import etree
tree = etree.fromstring(templateXml)
crewList2 = []
for element in root.xpath('//video/crew/crew_member/roles/role'):
    crewList2.append( element )
4

2 回答 2

1

我会去:

crewList2 = []
for element in tree.xpath('//video/crew/crew_member/roles'):
    role = element.xpath('.//role[1]')
    if role:
        crewList2.append(role[0].text)

print crewList2

印刷:

['Director', 'Screenwriter', 'Screenwriter', 'Producer']
于 2013-08-12T14:30:04.397 回答
0

使用一个 XPath 表达式、命名空间注册和lxml.etree.tostring(..., method="text")

roles = tree.xpath('//it:video/it:crew/it:crew_member/it:roles/it:role[1]', namespaces={"it": "http://apple.com/itunes/importer"})
crewList2 = [etree.tostring(e, method="text", encoding=unicode).strip() for e in roles]
于 2013-08-12T15:20:37.157 回答