18

我有一个 XML 文件和一个 XML 模式。我想根据该架构验证文件并检查它是否符合该架构。我正在使用 python,但如果 python 中没有这样有用的库,我对任何语言都是开放的。

我在这里最好的选择是什么?我会担心我能以多快的速度启动并运行它。

4

3 回答 3

26

绝对lxml

使用预定义的模式定义一个XMLParser,加载文件fromstring()并捕获任何 XML 模式错误:

from lxml import etree

def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'r') as f:
            etree.fromstring(f.read(), xmlparser) 
        return True
    except etree.XMLSchemaError:
        return False

schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
    schema_root = etree.XML(f.read())

schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)

filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
    if validate(xmlparser, filename):
        print("%s validates" % filename)
    else:
        print("%s doesn't validate" % filename)

编码注意事项

如果架构文件包含带有编码(例如<?xml version="1.0" encoding="UTF-8"?>)的 xml 标记,则上面的代码将生成以下错误:

Traceback (most recent call last):
  File "<input>", line 2, in <module>
    schema_root = etree.XML(f.read())
  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

一种解决方案是以字节模式打开文件:open(..., 'rb')

[...]
def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]
于 2013-07-23T20:03:23.837 回答
3

python 片段很好,但另一种方法是使用 xmllint:

xmllint -schema sample.xsd --noout sample.xml
于 2017-01-10T15:01:43.473 回答
0
import xmlschema


def get_validation_errors(xml_file, xsd_file):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(xml_file)
    errors = list()
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        err = validation_error.__str__()
        errors.append(err)
        print(err)
    return errors

于 2021-06-23T07:36:22.440 回答