早晨,
我必须用Java解析一个巨大的 xml 文件(2GB) 。它有很多标签,但我只需要写两个标签的内容,<title>
每次<subtext>
都在一个公共文件中,所以我使用SaxParse
到目前为止,我已经设法在输出文件中写入 1M95 文本,到那时会发生此异常:
org.xml.sax.SAXParseException; systemId: filePath; lineNumber: x; columnNumber: y; JAXP00010004 : La taille cumulée des entités est "50 000 001" et dépasse la limite de "50 000 000" définie par "FEATURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.checkEntityLimit(XMLScanner.java:1544)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.handleCharacter(XMLDocumentFragmentScannerImpl.java:1940)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1866)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3058)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:504)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:328)
at Parsing.main(Class.java:38)
异常的翻译如下:
The cumulative size of the entities is "50 000 001" which exceeds the boundary of "50 000 000" defined by "FEATURE_SECURE_PROCESSING".
这是我写的代码:
public class Parsing {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
try {
File inputFile = new File(System.getProperty("user.dir") + "/input.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void doThingOne(String text, String title) throws IOException {
// Write the text and the title on a file
}
public static void doThingTwo(String text, String title) throws IOException {
//Write the text and the title on another file
}
class UserHandler extends DefaultHandler {
boolean bText = false;
boolean bTitle = false;
StringBuffer tagTextBuffer;
StringBuffer tagTitleBuffer;
String text = null;
String title = null;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("title")) {
tagTitleBuffer = new StringBuffer();
bTitle = true;
} else if (qName.equalsIgnoreCase("text")) {
tagTextBuffer = new StringBuffer();
bText = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("title")) {
bTitle = false;
title = tagTextBuffer.toString();
} else if (qName.equals("text")) {
text = tagTextBuffer.toString();
bText = false;
if (text!=null && title == "One") {
try {
Parsing.doThingOne(page, title);
} catch (IOException e) {
e.printStackTrace();
}
} else if (text != null) {
try {
Parsing.doThingTwo(page, title);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bTitle) {
tagTitleBuffer.append(new String(ch, start, length));
} else if (bText) {
tagTextBuffer.append(new String(ch, start, length));
}
}
}
感谢您的时间。