java - 如何使用 Java 从 Web 服务中保存大文件？

Question

我必须调用一个以 xml 形式返回大量数据的 rest web 服务。数据大小约为490m。每次我尝试调用该服务时，我都会内存不足。我要做的就是将此数据写入文件。

有没有办法以小块读取和写入数据以避免内存不足？

这是我尝试过的；

public class GetWs {

   private static String url ="http://somewebservice";
   public static void main(String[] args) {

    InputStream in;
    OutputStream out;
    try {
          out = new FileOutputStream("testoutfile.txt");
          in = new URL(url).openStream();
          int b;
          do {
               b = in.read();
               if (b != -1) {
            out.write(b);
                 out.flush();
               }
           } while (b != -1);
            in.close();out.close();     
    } catch (Exception e) {
        e.printStackTrace();
     }

   }

}

score 2 · Accepted Answer

尝试压缩和流式传输到文件输出流，最好使用 NIO。

如果您必须解析和验证 XML，请尝试使用 STAX 解析器。

score 2 · Accepted Answer

如果您真的只想将该 URL 的内容下载到文件中，请尝试使用Google Guava，它是非常棒的辅助方法：

URL url = ...
File file = ...
ByteStreams.copy(
    Resources.newInputStreamSupplier(url),
    Files.newOutputStreamSupplier(file));

这使您免于编写另一个具有适当异常处理的复制循环。甚至不需要关闭任何流，ByteStreams.copy()它会为您完成。

如果要将数据存储为 UTF-16，请使用以下内容：

Charset charsetFromServer = ...; // See notes below.

CharStreams.copy(
    Resources.newReaderSupplier(url, charsetFromServer),
    Files.newWriterSupplier(file, Charsets.UTF_16));

有几种设置方法charsetFromServer：

如果您可以信任服务器始终使用相同的字符集，请通过使用或 Guava类Charset.forName(String)中的常量之一手动设置它。Charsets但是，要非常非常确定服务器永远不会使用任何其他编码，否则会中断。
更复杂的方法是通过查看Content-Type标头来确定服务器在运行时使用的字符编码。我建议你看看Apache 的 HttpClient是如何做到的，或者只是使用 HttpClient 开始，让它像ContentType.getOrDefault(response.getEntity()).getCharset().

score 1 · Accepted Answer

如果您真的只是使用输入流，只需使用

byte[] buff = new byte[5000];
int num = 1;
while(num>1){
   num = inputStream.read(buff);
   outputStream.write(buff,0,num);
}

虽然您需要添加一些代码来检测何时到达文件末尾~~~（依赖于输入流的实现）~~~编辑不，你不会，并修复了一些代码

java - 如何使用 Java 从 Web 服务中保存大文件？

3 回答 3

Related

Reference