python - Python 3：无法使用 xmltodict 将 XML 转换为 dict

Question

我正在尝试将数据从 XML 文件转换为 python dict，但无法这样做。以下是我正在编写的代码。

import xmltodict
input_xml  = 'data.xml'  # This is the source file

with open(input_xml, encoding='utf-8', errors='ignore') as _file:
    data = _file.read()
    data = xmltodict.parse(data,'ASCII')
    print(data)
    exit()

在执行此代码时，以下是我得到的错误：
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 239, column 40.
经过多次点击和试验，我意识到我的 xml 在特定标签内有一些印地语字符，如下所示

<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>

在运行之前如何忽略这些未编码的字符xmltodict.parse？

score 1 · Accepted Answer

我猜这个问题与您正在阅读的文件的编码有关。你为什么要用'ASCII'来解析它？

如果您尝试从没有 ASCII 的 python 字符串中读取相同的 XML，它应该可以正常工作：

import xmltodict
xml = """<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>"""
xmltodict.parse(xml, process_namespaces=True)

结果是：

OrderedDict([('DECL', '!! आप की सेवा में पुनः पधारे !!')])

使用带有该单个输入行的文件，我可以正确解析它：

import xmltodict
input_xml  = 'tmp.txt'  # This is the source file

with open(input_xml, encoding='utf-8', mode='r') as _file:
    data = _file.read()
    data = xmltodict.parse(data)
    print(data)

问题很可能是您试图将其解析为“ASCII”。

python - Python 3：无法使用 xmltodict 将 XML 转换为 dict

1 回答 1

Related

Reference