0

嗨,我对 python 编程很陌生。我有一个结构的xml文件:

<?xml version="1.0" encoding="UTF-8"?>
-<LidcReadMessage xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd" 
                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                  xmlns="http://www.nih.gov" uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1307390687803.0">
    -<ResponseHeader>
        <Version>1.8.1</Version>
        <MessageId>-421198203</MessageId>
        <DateRequest>2007-11-01</DateRequest>
        <TimeRequest>12:30:44</TimeRequest>
        <RequestingSite>removed</RequestingSite>
        <ServicingSite>removed</ServicingSite>
        <TaskDescription>Second unblinded read</TaskDescription>
        <CtImageFile>removed</CtImageFile>
        <SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192</SeriesInstanceUid>
        <DateService>2008-08-18</DateService>
        <TimeService>02:05:51</TimeService>
        <ResponseDescription>1 - Reading complete</ResponseDescription>
        <StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178</StudyInstanceUID>
    </ResponseHeader>
    -<readingSession>
        <annotationVersion>3.12</annotationVersion>
        <servicingRadiologistID>540461523</servicingRadiologistID>
        -<unblindedReadNodule>
            <noduleID>Nodule 001</noduleID>
            -<characteristics>
                <subtlety>5</subtlety>
                <internalStructure>1</internalStructure>
                <calcification>6</calcification>
                <sphericity>3</sphericity>
                <margin>3</margin>
                <lobulation>3</lobulation>
                <spiculation>4</spiculation>
                <texture>5</texture>
                <malignancy>5</malignancy>
            </characteristics>
            -<roi>
                <imageZposition>-125.000000 </imageZposition>
                <imageSOP_UID>1.3.6.1.4.1.14519.5.2.1.6279.6001.110383487652933113465768208719</imageSOP_UID>
                ......

有四个包含多个 . 每个都包含一个 . 我需要从所有这些标题中提取信息。

现在我正在这样做:

import xml.etree.ElementTree as ET
tree = ET.parse('069.xml')
root = tree.getroot()
#lst = []
for readingsession in root.iter('readingSession'):
    for roi in readingsession.findall('roi'):
        id = roi.findtext('imageSOP_UID')
    print(id)

但它的输出是这样的:

进程以退出代码 0 结束。如果有人可以提供帮助。

4

1 回答 1

0

真正的问题是命名空间。我尝试了使用和不使用它,但它不适用于此代码。

    ds = pydicom.dcmread("000071.dcm")
    uid = ds.SOPInstanceUID
    tree = ET.parse("069.xml")
    root = tree.getroot()
for child in root:
    print(child.tag)
    if child.tag == '{http://www.nih.gov}readingSession':
        read = child.find('{http://www.nih.gov}unblindedReadNodule')
        if read != None:
            nodule_id = read.find('{http://www.nih.gov}noduleID').text
            xml_uid = read.find('{http://www.nih.gov}roi').find('{http://www.nih.gov}imageSOP_UID').text
            if xml_uid == uid:
                print(xml_uid, "=", uid)
                roi= read.find('{http://www.nih.gov}roi')
                print(roi)

这项工作完全可以从 LIDC/IDRI 数据集的 dicom 图像中获取 uid,然后从 xml 文件中为其感兴趣区域提取相同的 uid。

于 2020-04-25T09:18:29.640 回答