python - 在python中解析一个特殊的xml

Question

我有一个特殊的 xml 文件，如下所示：

<alarm-dictionary source="DDD" type="ProxyComponent">

    <alarm code="402" severity="Alarm" name="DDM_Alarm_402">
    <message>Database memory usage low threshold crossed</message>
    <description>dnKinds = database
    type = quality_of_service
    perceived_severity = minor
    probable_cause = thresholdCrossed
    additional_text = Database memory usage low threshold crossed
    </description>
    </alarm>

        ...
</alarm-dictionary>

我知道在 python 中，我可以通过以下方式获取标签警报中的“警报代码”、“严重性” ：

for alarm_tag in dom.getElementsByTagName('alarm'):
    if alarm_tag.hasAttribute('code'):
        alarmcode = str(alarm_tag.getAttribute('code'))

我可以得到标签消息中的文本，如下所示：

for messages_tag in dom.getElementsByTagName('message'):
    messages = ""
    for message_tag in messages_tag.childNodes:
        if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
            messages += message_tag.data

但我也想在标签描述中获得像dnkind (database)、type (quality_of_service)、perspective_severity (thresholdCrossed) 和probable_cause (Database memory usage low threshold crossed)这样的值。

也就是我也想解析xml中标签中的内容。

谁能帮我解决这个问题？非常感谢！

score 4 · Accepted Answer

一旦您从description标签中获得了文本，就与 XML 解析无关。您只需要进行简单的字符串解析即可将type = quality_of_service键/值字符串转换为更适合在 Python 中使用的内容，例如字典

由于ElementTree的一些稍微简单的解析，它看起来像这样

messages = """
<alarm-dictionary source="DDD" type="ProxyComponent">

    <alarm code="402" severity="Alarm" name="DDM_Alarm_402">
    <message>Database memory usage low threshold crossed</message>
    <description>dnKinds = database
    type = quality_of_service
    perceived_severity = minor
    probable_cause = thresholdCrossed
    additional_text = Database memory usage low threshold crossed
    </description>
    </alarm>

        ...
</alarm-dictionary>
"""

import xml.etree.ElementTree as ET

# Parse XML
tree = ET.fromstring(messages)

for alarm in tree.getchildren():
    # Get code and severity
    print alarm.get("code")
    print alarm.get("severity")

    # Grab description text
    descr = alarm.find("description").text

    # Parse "thing=other" into dict like {'thing': 'other'}
    info = {}
    for dl in descr.splitlines():
        if len(dl.strip()) > 0:
            key, _, value = dl.partition("=")
            info[key.strip()] = value.strip()
    print info

score 2 · Accepted Answer

我对 Python 并不完全确定，但经过快速研究。

看到您已经可以从 XML 中的描述标记中获取所有内容，您是否可以不按换行符拆分，然后使用等号上的 str.split() 函数拆分每一行以分别为您提供名称/值？

例如

for messages_tag in dom.getElementsByTagName('message'):
messages = ""
for message_tag in messages_tag.childNodes:
    if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
        messages += message_tag.data
tag =  str.split('=');
tagName = tag[0]
tagValue = tag[1]

（我没有考虑将每条线分开和循环）

但这应该让你走上正确的轨道:)

score 2 · Accepted Answer

AFAIK 没有将文本作为DOM元素处理的库。

但是，您可以（在message变量中有消息之后）执行以下操作：

description = {}
messageParts = message.split("\n")
for part in messageParts:
    descInfo = part.split("=")
    description[descInfo[0].strip()] = descInfo[1].strip()

然后您将以地图description的形式获得所需的内部信息。key-value

您还应该在我的代码上添加错误处理...

python - 在python中解析一个特殊的xml

3 回答 3

Related

Reference