-1

我的代码需要在 GZIPInputStream 中下载一个大的 xml 文件 (500MB) 并对其进行处理,并对每个对象执行一些操作。这些操作需要时间才能完成,而且我有很多对象要处理。我正在使用 commons http-client 3.1 和 stax。

public void download(String url) throws HttpException, IOException, 
                XMLStreamException, FactoryConfigurationError {

        GetMethod getMethod = new GetMethod(url);
        try {
            httpClient.executeMethod(getMethod);    
            Header contentEncoding = getMethod.getResponseHeader("Content-Encoding");
            if (contentEncoding != null) {
                String acceptEncodingValue = contentEncoding.getValue();
                if (acceptEncodingValue.indexOf("gzip") != -1) {
                    processStream(new GZIPInputStream(getMethod.getResponseBodyAsStream()));
                    return;
                }
            }

            processStream(getMethod.getResponseBodyAsStream());
            return;           
        } finally {
            getMethod.releaseConnection();
        }
    }

    protected void processStream(InputStream inputStream) throws XMLStreamException, FactoryConfigurationError {
        XMLStreamReader xmlStreamReader = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
        //parses xml with Stax           
        //executes some long operations for each object
    }

当我运行代码时,它会一直工作,直到两三个小时后,我得到一个SocketException: Connection reset. 看起来服务器已关闭连接,是否正确?有没有办法在服务器端不做任何更改的情况下避免此错误?如果没有,我该如何处理以避免从一开始就重新运行我的应用程序?

com.ctc.wstx.exc.WstxIOException: Connection reset
    at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
    .................
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:168)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:182)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
    at java.io.FilterInputStream.read(FilterInputStream.java:90)
    at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
    at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
    at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
    at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
    at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
    at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
    at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3037)
    at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
    at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
4

2 回答 2

0

一种建议是在本地缓存文件,然后再进行处理。

IE。您的处理程序只需读取流并将其写入磁盘上的临时文件。然后它关闭流并处理临时文件中的数据。

无论如何,这可能是一个好方法,因为即使您可以保持链接正常,某些网络中断的可能性、降低的 QoS 等可能会使检索文件不可靠。您可能还阻止服务器在整个处理过程中对其进行更新,这有点反社会。

于 2011-12-15T14:31:28.357 回答
0

如果您无法将 xml 复制到本地计算机,请尝试查看连接是否超时。也许 xml 的处理时间太长,并且连接被中间服务器之一重置

于 2011-12-16T21:14:54.583 回答