I have a XML file and its corresponding XSD file. While validating using StAX parser, I've attached an error handler. Basically, I encounter two types of errors in a well-formed XML file.
1) Incorrect type of data inside an element, for e.g string inside an element that is supposed to have an integer.
2) Missing element: An element that must be present according to XSD is not present in the XML.
Using a StAX parser and custom error handler, I'm able to rectify the first type of error. But for the second type, a CHARACTER event is triggered and the value of the TEXT is the value of immediate next element. I don't know how to figure out the missing element. Also, why the CHARACTER event is triggered and the missing element is completely ignored?
As the StAX parser is forward only, is there a way to rectify both of the errors using other parsers?
import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import javax.xml.validation.Validator;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLValidation {
public static void main(String[] args) {
XMLValidation xmlValidation = new XMLValidation();
System.out.println(xmlValidation.validateXMLSchema("PHSHumanSubjectsAndClinicalTrialsInfo-V1.0.xsd", "FullPHSHuman.xml"));
}
public boolean validateXMLSchema(String xsdPath, String xmlPath){
try {
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File(xsdPath));
StreamSource XML = new StreamSource(xmlPath);
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(XML);
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler(reader));
validator.validate(new StAXSource(reader));
} catch (IOException | SAXException | XMLStreamException e) {
System.out.println("Exception: "+e.getMessage() + " local message " + e.getLocalizedMessage() + " cause " + e.getCause());
return false;
}
return true;
}
}
class MyErrorHandler implements ErrorHandler {
private XMLStreamReader reader;
public MyErrorHandler(XMLStreamReader reader) {
this.reader = reader;
}
@Override
public void error(SAXParseException e) throws SAXException {
System.out.println("error");
warning(e);
}
@Override
public void fatalError(SAXParseException e) throws SAXException {
System.out.println("fatal error");
warning(e);
}
@Override
public void warning(SAXParseException e) throws SAXException {
if(reader.getEventType() == 1 || reader.getEventType() == 2) {
//The first type of error is detected here.
System.out.println(reader.getLocalName());
System.out.println(reader.getNamespaceURI());
}
if(reader.getEventType() == XMLStreamConstants.CHARACTERS) {
int start = reader. getTextStart();
int length = reader.getTextLength();
System.out.println(new String(reader.getTextCharacters(), start, length));
}
}
}
Below is the snippet of the well-formed XML file:
<?xml version="1.0" encoding="UTF-8"?>
<PHSHumanSubjectsAndClinicalTrialsInfo:PHSHumanSubjectsAndClinicalTrialsInfo xmlns:PHSHumanSubjectsAndClinicalTrialsInfo="http://apply.grants.gov/forms/PHSHumanSubjectsAndClinicalTrialsInfo-V1.0" PHSHumanSubjectsAndClinicalTrialsInfo:FormVersion="1.0"
>
<!-- <PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
>Y: </PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
>-->
<PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
>Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
>
<PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
>Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
>
Here the HumanSubjectsIndicator element is commented to provoke the second scenario. In this case a CHARACTER event is triggered in the 'MyErrorHandler'. The value 'Y:Yes' is obtained reader.getTextCharacters(). This value corresponds to the HumanSubjectsIndicator1 element (found this using the getLocation() method).
Is there a way to get exactly the Local Name of the missing element. If not using StAX, then using other parsers?
Thanks.