0

我想在我的 XML 文档中找到特定的标签并编辑它们的文本或属性。我的 XML 文件包含命名空间(我正确理解为嵌套命名空间)。我想为此目的使用的工具是 ElementTree。我设法通过 读取 XML 文件iterparse,但是我不知道如何保存已编辑的 XML,因为iterparse没有write元素。我需要一种解决方案来读取 XML 文件parse并剥离其命名空间和嵌套命名空间,或者一种保存迭代解析文件的方法。

对于这种情况,让我们编辑“评级”标签文本。

it = ET.iterparse(adiPath)
    for _, el in it:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
        for at in list(el.attrib): # strip namespaces of attributes too
            if '}' in at:
                newat = at.split('}', 1)[1]
                el.attrib[newat] = el.attrib[at]
                del el.attrib[at]
    root = it.root

    # Search Rating tag and edit it's value
    for rating in root.iter('Rating'):
        print(rating.text) # Prints 18
        rating.text = "999"
        print(rating.text) # Prints 999

但是在这种情况下,XML 文件保持不变。

这是 XML 文件:

<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
  <Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
    <AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
    <Ext>
        <adb:ExtensionType>
            <adb:TitleExt>
                <adb:SeriesInfo episodeNumber="6">
                    <adb:series seriesId="GOT" seasonCount="8"></adb:series>
                    <adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
                </adb:SeriesInfo>
            </adb:TitleExt>
        </adb:ExtensionType>
    </Ext>
    <title:LocalizableTitle xml:lang="pol">
      <title:TitleLong>Game of Thrones VIII</title:TitleLong>
      <title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
      <title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
      <title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
      <title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
      <title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
      <title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
    </title:LocalizableTitle>
    <title:Rating ratingSystem="PL">18</title:Rating>
    <title:Audience>General</title:Audience>
    <title:DisplayRunTime>01:15</title:DisplayRunTime>
    <title:Year>2019</title:Year>
    <title:CountryOfOrigin>US</title:CountryOfOrigin>
    <title:Genre>Film fantasy</title:Genre>
    <title:ShowType>Movie</title:ShowType>
  </Asset>
  <Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
    <AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
    <offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
  </Asset>
  <Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT1H15M20S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT0H01M48S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
    <content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
    <content:X_Resolution>600</content:X_Resolution>
    <content:Y_Resolution>900</content:Y_Resolution>
    <content:Language>pol</content:Language>
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIIIPoster" />
  </Asset>
</ADI3>
4

1 回答 1

2

我建议使用命名空间通配符,而不是剥离命名空间。在 Python 3.8 中添加了对此的支持。

from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating")  # Find the Rating element in any namespace
rating.text = "999"

请注意,您必须使用find()(或findall())。通配符不适用于iter().


以下解决方法可用于在序列化 XML 文档时保留原始命名空间前缀(另请参阅https://stackoverflow.com/a/42372404/407651https://stackoverflow.com/a/54491129/407651)。

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
    ET.register_namespace(ns, namespaces[ns])
于 2020-04-11T09:30:34.847 回答