0

I have a XML file and its corresponding XSD file. While validating using StAX parser, I've attached an error handler. Basically, I encounter two types of errors in a well-formed XML file.

1) Incorrect type of data inside an element, for e.g string inside an element that is supposed to have an integer.

2) Missing element: An element that must be present according to XSD is not present in the XML.

Using a StAX parser and custom error handler, I'm able to rectify the first type of error. But for the second type, a CHARACTER event is triggered and the value of the TEXT is the value of immediate next element. I don't know how to figure out the missing element. Also, why the CHARACTER event is triggered and the missing element is completely ignored?

As the StAX parser is forward only, is there a way to rectify both of the errors using other parsers?

import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import javax.xml.validation.Validator;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class XMLValidation {

    public static void main(String[] args) {

        XMLValidation xmlValidation = new XMLValidation();
        System.out.println(xmlValidation.validateXMLSchema("PHSHumanSubjectsAndClinicalTrialsInfo-V1.0.xsd", "FullPHSHuman.xml"));
    }

    public boolean validateXMLSchema(String xsdPath, String xmlPath){

        try {
            SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
            Schema schema = factory.newSchema(new File(xsdPath));
            StreamSource XML = new StreamSource(xmlPath);
            XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(XML);
            Validator validator = schema.newValidator();
            validator.setErrorHandler(new MyErrorHandler(reader));
            validator.validate(new StAXSource(reader));
        } catch (IOException | SAXException | XMLStreamException e) {
            System.out.println("Exception: "+e.getMessage() + " local message " + e.getLocalizedMessage() + " cause " + e.getCause());
            return false;
        }
        return true;
    }
}

class MyErrorHandler implements ErrorHandler {

    private XMLStreamReader reader;

    public MyErrorHandler(XMLStreamReader reader) {
        this.reader = reader;
    }

    @Override
    public void error(SAXParseException e) throws SAXException {
        System.out.println("error");
        warning(e);
    }

    @Override
    public void fatalError(SAXParseException e) throws SAXException {
        System.out.println("fatal error");
        warning(e);
    }

    @Override
    public void warning(SAXParseException e) throws SAXException {
        if(reader.getEventType() == 1 || reader.getEventType() == 2) {
            //The first type of error is detected here.
            System.out.println(reader.getLocalName());
            System.out.println(reader.getNamespaceURI());

        }

        if(reader.getEventType() == XMLStreamConstants.CHARACTERS) {
            int start = reader. getTextStart();
            int length = reader.getTextLength();
            System.out.println(new String(reader.getTextCharacters(), start, length));
        }
    }
}

Below is the snippet of the well-formed XML file:

<?xml version="1.0" encoding="UTF-8"?>
<PHSHumanSubjectsAndClinicalTrialsInfo:PHSHumanSubjectsAndClinicalTrialsInfo xmlns:PHSHumanSubjectsAndClinicalTrialsInfo="http://apply.grants.gov/forms/PHSHumanSubjectsAndClinicalTrialsInfo-V1.0" PHSHumanSubjectsAndClinicalTrialsInfo:FormVersion="1.0"
>
<!--    <PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
    >Y: </PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
    >-->
    <PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
    >Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
    >
    <PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
    >Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
    >

Here the HumanSubjectsIndicator element is commented to provoke the second scenario. In this case a CHARACTER event is triggered in the 'MyErrorHandler'. The value 'Y:Yes' is obtained reader.getTextCharacters(). This value corresponds to the HumanSubjectsIndicator1 element (found this using the getLocation() method).

Is there a way to get exactly the Local Name of the missing element. If not using StAX, then using other parsers?

Thanks.

4

1 回答 1

1

当缺少必需元素时,Saxon XSD 验证器会向您显示如下消息:

Validation error on line 12 column 17 of books.xml:
  FORG0001: In content of element <ITEM>: The content model does not allow element <PRICE>
  to appear immediately after element <PUB-DATE>. It must be preceded by <LANGUAGE>. 
  See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.4

您可以尝试对错误消息进行模式匹配并提取缺失元素的名称。

大多数模式处理器不向您提供此信息的原因是它们在内部工作的方式。通常,模式处理器构造一个有限状态机,它为输入中的每个元素指示允许接下来出现哪些元素。如果下一个元素不是允许的元素之一,则从 FSM 中无法立即看出为什么会出现这种情况。Saxon 做了一些额外的分析来尝试改进诊断:如果输入包含从 A 到 C 的不允许转换,那么它会搜索 FSM 以发现从 A 到 B 以及从 B 到 C 的允许转换,并构造一个错误消息说 B 不见了。

于 2017-12-04T12:02:33.623 回答