python - 通过 python lxml tree.xpath 解析 xml

Question

我尝试解析一个巨大的文件。示例如下。我试着接受<Name>，但我不能它只有在没有这个字符串的情况下才能工作

<LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">

xml2 = '''<?xml version="1.0" encoding="UTF-8"?>
<PackageLevelLayout>
<LevelLayouts>
    <LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432">
                <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
                    <LevelLayoutSectionBase>
                        <LevelLayoutItemBase>
                            <Name>Tracking ID</Name>
                        </LevelLayoutItemBase>
                    </LevelLayoutSectionBase>
                </LevelLayout>
            </LevelLayout>
    </LevelLayouts>
</PackageLevelLayout>'''

from lxml import etree
tree = etree.XML(xml2)
nodes = tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/LevelLayout/LevelLayoutSectionBase/LevelLayoutItemBase/Name')
print nodes

score 3 · Accepted Answer

您的嵌套LevelLayoutXML 文档使用命名空间。我会使用：

tree.xpath('.//LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]//*[local-name()="Name"]')

将Name元素与较短的 XPath 表达式匹配（完全忽略命名空间）。

另一种方法是使用前缀到命名空间的映射并在标签上使用它们：

nsmap = {'acd': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain'}

tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/acd:LevelLayout/acd:LevelLayoutSectionBase/acd:LevelLayoutItemBase/acd:Name',
    namespaces=nsmap)

score 0 · Accepted Answer

lxml的xpath方法有一个namespaces参数。您可以将字典映射命名空间前缀传递给命名空间。然后你可以引用XPath使用命名空间前缀的 build ：

xml2 = '''<?xml version="1.0" encoding="UTF-8"?>
<PackageLevelLayout>
<LevelLayouts>
    <LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432">
                <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
                    <LevelLayoutSectionBase>
                        <LevelLayoutItemBase>
                            <Name>Tracking ID</Name>
                        </LevelLayoutItemBase>
                    </LevelLayoutSectionBase>
                </LevelLayout>
            </LevelLayout>
    </LevelLayouts>
</PackageLevelLayout>'''

namespaces={'ns': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain',
            'i': 'http://www.w3.org/2001/XMLSchema-instance'}

import lxml.etree as ET
# This is an lxml.etree._Element, not a tree, so don't call it tree
root = ET.XML(xml2)

nodes = root.xpath(
    '''/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]
       /ns:LevelLayout/ns:LevelLayoutSectionBase/ns:LevelLayoutItemBase/ns:Name''', namespaces = namespaces)
print nodes

产量

[<Element {http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain}Name at 0xb74974dc>]

python - 通过 python lxml tree.xpath 解析 xml

2 回答 2

Related

Reference