我有以下格式的 xml 文档
<samples>
<sample count="10" intentref="none">
Remember to
<annotation conceptref="cf1">
<annotation conceptref="cf2">record</annotation>
</annotation>
the
<annotation conceptref="cf3">movie</annotation>
<annotation conceptref="cf4">Taxi driver</annotation>
</sample>
</samples>
并且我想提取所有文本,无论是未封装在注释标签中的文本还是注释标签中的文本,以重建原始短语所以我的输出将是->记得记录电影出租车司机
问题显然是没有办法获得令牌'the' 这里是我的代码片段
import xml.etree.ElementTree as ET
samples = ET.fromstring("""
<samples>
<sample count="10" intentref="none">Remember to<annotation conceptref="cf1"><annotation conceptref="cf2">record</annotation></annotation>the<annotation conceptref="cf3">movie</annotation><annotation conceptref="cf4">Taxi driver</annotation></sample>
</samples>
""")
for sample in samples.iter("sample"):
print ('***'+sample.text+'***'+sample.tail)
for annotation in sample.iter('annotation'):
print(annotation.text)
for nested_annotation in annotation.getchildren():
print(nested_annotation.text)
我认为嵌套注释会成功..但不,这是结果
***Remember to'***
None
record
record
movie
Taxi driver