3

我正在使用以下简单的 StAX 代码来遍历 XML 中的所有标签。input.xml的大小 > 100 MB

XMLInputFactory xif = XMLInputFactory.newInstance();
        FileInputStream in = new FileInputStream("input.xml");
        XMLStreamReader xsr = XMLInputFactory.newInstance().createXMLStreamReader(in);

        xsr.next();
        while (xsr.hasNext()) {

            xsr.next();
            if(xsr.isStartElement() || xsr.isEndElement())
                 System.out.println(xsr.getLocalName());            
            }
        }

我收到此错误:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

请告诉我如何解决这个问题。我读到 StAX 很好地处理了巨大的 XML,但我遇到了与 DOM Parsers 相同的错误。

4

3 回答 3

1

使用 -Xmx 参数增加 Vm 的 MaxHeap 大小。

java -Xmx512m ....
于 2011-06-28T06:24:36.670 回答
1

在运行 JVM 时定义堆大小

-Xms    initial java heap size
-Xmx    maximum java heap size
-Xmn    the size of the heap for the young generation

例子:

bin/java.exe -Xmn100M -Xms500M -Xmx500M
于 2011-06-28T06:26:51.960 回答
0

来自 Wikipedia:传统上,XML API 是:

tree based - the entire document is read into memory as a tree structure for random 
access by the calling application
event based - the application registers to receive events as entities are encountered 
within the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor,
the  programmatic  entry point is a cursor that represents a point within the 
document. The application moves the cursor forward - 'pulling' the information from 
the parser as it needs. This is different from an event based API - such as SAX - 
which 'pushes' data to the application - requiring the application to maintain state 
between events as necessary to keep track of location within the document.

所以对于 100M 或更多——我更喜欢 SAX——如果可能的话,请使用 StAX。

但是我在 JVM64 上尝试了文件大小为 2.6GB 的代码。没有问题。所以我认为这个问题不是文件大小,而是可能是数据。

于 2011-06-28T06:50:18.460 回答