2

我正在尝试在此链接上下载高分辨率产品图片

http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm

单击下载高分辨率照片时,我可以轻松下载,但是当我尝试复制图像 URL,然后从其他选项卡下载时,我得到 3005_75310.jpg 不存在。

所以我试图从第一个请求中查看请求标头并将它们设置在我的 URL java 对象中,但是创建的文件是空的,有人知道吗?

public static void saveImage(String imageUrl, String destinationFile) {
    URL url;
    try {
        url = new URL(imageUrl);
        URLConnection uc = url.openConnection();

        uc.setRequestProperty("Accept",
                "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
        uc.setRequestProperty("Accept-Charset",
                "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
        uc.setRequestProperty("Accept-Encoding", "gzip,deflate,sdch");
        uc.setRequestProperty("Accept-Language", "en-US,en;q=0.8");
        uc.setRequestProperty("Connection", "keep-alive");

        uc.setRequestProperty(
                "Referer",
                "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm");

        InputStream is = url.openStream();
        OutputStream os = new FileOutputStream(destinationFile);

        byte[] b = new byte[2048];
        int length;

        while ((length = is.read(b)) != -1) {
            os.write(b, 0, length);
        }

        is.close();
        os.close();
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}
4

2 回答 2

0

提供的引荐来源网址并不是网站编码人员所期望的一种防止您正在执行的抓取的方法。示例工作请求:

$ wget \
  --referer=http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm \
  http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg


Length: unspecified [image/jpeg]
Saving to: `3005_75310.jpg'

    [  <=>                                                                                ] 346,125      949K/s   in 0.4s

2013-01-29 13:24:02 (949 KB/s) - `3005_75310.jpg' saved [346125]
于 2013-01-29T18:26:52.240 回答
0

对于它的价值,看起来唯一重要的标题是“Referer”标题:

这失败了:

curl "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg" > /test/3005_75310.jpg

这有效:

curl -H "Referer: http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm" "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg" > /test/3005_75310.jpg

对于在 Java 中提取图像数据,我发现使用 DataInputStream 的 readFully() 方法最成功。

于 2013-01-29T18:28:00.997 回答