4

Suppose I have an XML document of the following form

<root>
  <foos>
    <foo>the quick <bar>brown </bar>fox</foo>
  </foos>
  <!-- Lots more <foo></foo> -->
</root>

How do I extract the full text string the quick fox as well as the string brown?

import xml.etree.ElementTree as ET
doc = ET.parse(xmldocument).getroot()
foos = doc.find('foos')
for foo in foos:
    print foo.text # This will print 'the quick '

Not sure how to solve this problem.

4

2 回答 2

2

You can also try something like this, which iterates in all nested tags automatically:

foos = doc.find('foos')
for foo in foos:
    for text in foo.itertext():
        print text.strip(),
    print
于 2013-08-30T18:42:36.340 回答
0
from scrapy.selector import XmlXPathSelector

xml = \
"""
<root>
    <foos>
        <foo>the quick <bar>brown </bar>fox</foo>
    </foos>
</root>
"""


hxs =XmlXPathSelector(text=xml)
foos = hxs.select('//foos')
for one in foos:
    text = one.select('./foo//text()').extract()
    text = ''.join(text)
    print text
于 2013-08-30T18:58:10.460 回答