1

有一个这样的xml文件:

示例.xml

<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>

我想用'/'分割headword.text,然后用标签包装它们。最后我需要删除标签。我期望的输出是:

<root>
    <keyword_group>
        <word>sell</word>
        <word>buy</word>
    </keyword_group>
</root>

我丑陋的脚本是:

import lxml.etree as ET

xml = '''\
<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>
'''

root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
    words = headword.text.split('/')
    for word in words:
        ET.SubElement(headword, 'word')
        for wr in headword.iter('word'):
            if not wr.text:
                wr.text = word
    headword.text = ''

print(ET.tostring(root, encoding='unicode'))

但这太复杂了,我没有删除词条标签。

4

1 回答 1

2

使用lxml

import lxml.etree as ET

xml = '''\
<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>
'''

root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
    words = headword.text.split('/')
    parent = headword.getparent()
    parent.remove(headword)
    for word in words:
        ET.SubElement(parent, 'word').text = word

print(ET.tostring(root, encoding='unicode'))

产量

<root>
    <keyword_group>
        <word>sell</word><word>buy</word></keyword_group>
</root>
于 2013-01-31T13:17:39.950 回答