java - 为什么下载的文件会损坏？

Question

我一直在尝试从以下 URL 下载 pdf 文件：http: //pdfobject.com/markup/examples/full-browser-window.html

Josh M建议了以下适用于他的计算机的解决方案。但是，我无法让它工作。我的意思是下面的代码将文件保存到目的地，然而，下载的文件的重量只有 984 字节（通常应该是 18 Kb）。所以文件损坏了。我想不出为什么会发生这种情况的任何原因？

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;

public final class FileDownloader {

    private FileDownloader(){}

    public static void main(String args[]) throws IOException{
        download("http://pdfobject.com/markup/examples/full-browser-window.html", new File("C:\\Users\\Owner\\Desktop\\temporary\\myFile.pdf"));
        download2("http://pdfobject.com/markup/examples/full-browser-window.html", new File("C:\\Users\\Owner\\Desktop\\temporary\\myFile2.pdf"));
    }

    public static void download(final String url, final File destination) throws IOException {
        final URLConnection connection = new URL(url).openConnection();
        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Mozilla/5.0");
        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
        final byte[] buffer = new byte[2048];
        int read;
        final InputStream input = connection.getInputStream();
        while((read = input.read(buffer)) > -1)
            baos.write(buffer, 0, read);
        baos.flush();
        Files.write(destination.toPath(), baos.toByteArray(), StandardOpenOption.WRITE);
        input.close();
    }

    public static void download2(final String url, final File destination) throws IOException {
        final URLConnection connection = new URL(url).openConnection();
        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Mozilla/5.0");
        final FileOutputStream output = new FileOutputStream(destination, false);
        final byte[] buffer = new byte[2048];
        int read;
        final InputStream input = connection.getInputStream();
        while((read = input.read(buffer)) > -1)
            output.write(buffer, 0, read);
        output.flush();
        output.close();
        input.close();
    }
}

score 3 · Accepted Answer

您正在下载一个 .html URL，其中包含作为嵌入对象的引用 PDF。与浏览器不同，Java 不处理它，因此您保存的是 HTML，而不是 PDF。看看里面。为了您的帮助，这里是：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Embedding a PDF using static HTML markup: Full-browser window (100% width/height)</title>

<!-- This example created for PDFObject.com by Philip Hutchison (www.pipwerks.com) -->

<style type="text/css">
<!--

html {
   height: 100%;
}

body {
   margin: 0;
   padding: 0;
   height: 100%;
}

p {
   padding: 1em;
}

object {
   display: block;
}

-->
</style>

</head>

<body>

<object data="/pdf/sample.pdf#toolbar=1&amp;navpanes=0&amp;scrollbar=1&amp;page=1&amp;view=FitH" 
        type="application/pdf" 
        width="100%" 
        height="100%">

<p>It appears you don't have a PDF plugin for this browser. No biggie... you can <a href="/pdf/sample.pdf">click here to download the PDF file.</a></p>

</object>

</body>
</html>

java - 为什么下载的文件会损坏？

1 回答 1

Related

Reference