This answer really helped me:
JAXB - unmarshal XML exception
In my case, I'm parsing results from Sysinternals Autoruns tool with the XML switch (-x). Either because the results were being written to a file share or for some buggy reason in the newer version, the XML would be malformed near the end. Since this Autoruns capture is critical for malware investigations, I really wanted the data. Plus I could tell from the file size that the results were nearly complete.
The solution in the linked question works really well when you have a document with many sub-elements as suggested by the OP. In particular, the Autoruns XML output is really simple and consists of many "items", each consisting of a many simple elements with text (i.e. String properties as generated by XJC). So if a few items are missed at the end, no big deal... unless of course it's something related to malware. :)
Here's my code:
public class Loader {
private List<Exception> exceptions = new ArrayList<>();
public synchronized List<Exception> getExceptions() {
return new ArrayList<>(exceptions);
}
protected void setExceptions(List<Exception> exceptions) {
this.exceptions = exceptions;
}
public synchronized Autoruns load(File file, boolean attemptRecovery)
throws LoaderException {
Unmarshaller unmarshaller;
try {
JAXBContext context = newInstance(Autoruns.class);
unmarshaller = context.createUnmarshaller();
} catch (JAXBException ex) {
throw new LoaderException("Could not create unmarshaller.", ex);
}
try {
return (Autoruns) unmarshaller.unmarshal(file);
} catch (JAXBException ex) {
if (!attemptRecovery) {
throw new LoaderException(ex.getMessage(), ex);
}
}
exceptions.clear();
Autoruns autoruns = new Autoruns();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try {
XMLEventReader eventReader =
inputFactory.createXMLEventReader(new FileInputStream(file));
while (eventReader.hasNext()) {
XMLEvent event = eventReader.peek();
if (event.isStartElement()) {
StartElement start = event.asStartElement();
if (start.getName().getLocalPart().equals("item")) {
// note the try should allow processing of elements
// after this item in the event it is malformed
try {
JAXBElement<Autoruns.Item> jax_b =
unmarshaller.unmarshal(eventReader,
Autoruns.Item.class);
autoruns.getItem().add(jax_b.getValue());
} catch (JAXBException ex) {
exceptions.add(ex);
}
}
}
eventReader.next();
}
} catch (XMLStreamException | FileNotFoundException ex) {
exceptions.add(ex);
}
return autoruns;
}
public static Autoruns load(Path path) throws JAXBException {
return load(path.toFile());
}
public static Autoruns load(File file) throws JAXBException {
JAXBContext context = JAXBContext.newInstance(Autoruns.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
return (Autoruns) unmarshaller.unmarshal(file);
}
public static class LoaderException extends Exception {
public LoaderException(String message) {
super(message);
}
public LoaderException(String message, Throwable cause) {
super(message, cause);
}
}
}