0

我有以下 XML 文件:

<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>

我需要从特定标签中提取文本。在http://effbot.org上的示例很少,而且文档通常很差。也许其他地方有很好的例子?以及如何将相同标签(令牌)中的文本作为单独的实体处理?提前致谢!结果应该大致如下:

(like) feel > not #This is not text
4

1 回答 1

0

我不清楚您希望对<mess>元素的内容做什么。
对于<verb>元素的孩子,试试这个:

import xml.etree.ElementTree as ET
the_tree = ET.fromstring('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = the_tree.find('./verb').getchildren()
verbs = [verb.text for verb in elems]
# -> ['like', 'feel']

如果您的文件较大,也许您更喜欢这种访问元素的替代方式:

tree, id_map = ET.XMLID('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = id_map['1'].find('verb')
verbs = [verb.text for verb in elems]
于 2012-06-18T22:51:01.180 回答