我正在尝试从 XML 中提取某些数据点并尝试了两个选项...
- 使用 ElementTree 处理 XML 格式
- 使用 xmltodict 处理字典
这是我到目前为止所得到的,
代码
# Packages
# --------------------------------------
import xml.etree.ElementTree as ET
# XML Data
# --------------------------------------
message_xml = \
'<ClinicalDocument> \
<code code="34133-9" displayName="Summarization of Episode Note"/> \
<title>Care Summary</title> \
<recordTarget> \
<patientRole> \
<id assigningAuthorityName="LOCAL" extension="L123456"/> \
<id assigningAuthorityName="SSN" extension="788889999"/> \
<id assigningAuthorityName="GLOBAL" extension="G123456"/> \
<addr use="HP"> \
<streetAddressLine>1000 N SOME AVENUE</streetAddressLine> \
<city>BIG CITY</city> \
<state>NA</state> \
<postalCode>12345-1010</postalCode> \
<country>US</country> \
</addr> \
<telecom nullFlavor="NI"/> \
<patient> \
<name use="L"> \
<given>JANE</given> \
<given>JOE</given> \
<family>DOE</family> \
</name> \
</patient> \
</patientRole> \
</recordTarget> \
</ClinicalDocument>'
# Get Tree & Root
# --------------------------------------
tree = ET.ElementTree(ET.fromstring(message_xml))
root = tree.getroot()
# Iterate
# --------------------------------------
for node in root:
tag = node.tag
attribute = node.attrib
# Get ClinicalDocument.code values
if tag == 'code':
document_code_code = attribute.get('code')
document_code_name = attribute.get('displayName')
else:
pass
# Get ClinicalDocument.recordTarget values
if tag == 'recordTarget':
for child in node.iter():
# Multiple <id> tags
record_target_local = ??
record_target_ssn = ??
record_target_global = ??
# Multiple <given> tags
record_target_name_first = ??
record_target_name_middle = ??
record_target_name_last = ??
else:
pass
预期产出
document_code,document_name,id_local,id_ssn,id_global,name_first, name_middle,name_last
34133-9,Summarization of Episode Note,L123456,788889999,G123456,JANE,JOE,DOE
可接受的输出
document_code,document_name,id_type,id,name_first,name_middle,name_last
34133-9,Summarization of Episode Note,LOCAL,L123456,JANE,JOE,DOE
34133-9,Summarization of Episode Note,SSN,788889999,JANE,JOE,DOE
34133-9,Summarization of Episode Note,GLOBAL,G123456,JANE,JOE,DOE
问题
- 如何有效地导航具有多个子节点的子节点?
- 如何处理重复的标签(例如:
<id>
、、<given>
)?