0

我正在使用 Jsoup Java HTML 解析器从特定 URL 获取图像。但是有些图像会抛出状态 502 错误代码,并且没有保存到我的机器上。这是我使用的代码快照:-

String url = "http://www.jabong.com";
String html = Jsoup.connect(url.toString()).get().html();
Document doc = Jsoup.parse(html, url);
images = doc.select("img");

for (Element element : images) {
        String imgSrc = element.attr("abs:src");
        log.info(imgSrc);
        if (imgSrc != "") {
            saveFromUrl(imgSrc, dirPath+"/" + nameCounter + ".jpg");
            try {
                Thread.sleep(3000);
            } catch (InterruptedException e) {
                log.error("error in sleeping");
            }
            nameCounter++;
        }
}

saveFromURL 函数如下所示:-

public static void saveFromUrl(String Url, String destinationFile) {
    try {
        URL url = new URL(Url);
        InputStream is = url.openStream();
        OutputStream os = new FileOutputStream(destinationFile);

        byte[] b = new byte[2048];
        int length;

        while ((length = is.read(b)) != -1) {
            os.write(b, 0, length);
        }

        is.close();
        os.close();
    } catch (IOException e) {
        log.error("Error in saving file from url:" + Url);
        //e.printStackTrace();
    }
}

我在互联网上搜索了有关状态码 502 的信息,但它说错误是由于网关错误造成的。我不明白这一点。我认为此错误可能是因为我正在向循环中的图像发送获取请求。可能是网络服务器无法处理这么大的负载,因此在未发送前一个图像时拒绝对图像的请求。所以我在获取每张图像后尝试进入睡眠状态但没有运气:(请提供一些建议

4

2 回答 2

1

您的问题听起来像 HTTP 通信问题,因此您最好尝试使用库来处理事物的通信方面。看看Apache Commons HttpClient

关于您的代码示例的一些注释。您没有使用URLConnection对象,因此不清楚 Web/代理服务器和干净地关闭资源等方面的行为。提到的 HttpCommon 库将在这方面有所帮助。

似乎还有一些使用J2ME 库做你想做的事的例子。不是我个人使用过的东西,但也可以帮助你。

于 2012-04-13T13:39:31.607 回答
1

这是一个适用于我的完整代码示例......

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.Authenticator;
import java.net.HttpURLConnection;
import java.net.InetSocketAddress;
import java.net.MalformedURLException;
import java.net.Proxy;
import java.net.SocketAddress;
import java.net.URL;

public class DownloadImage {

    public static void main(String[] args) {

        // URLs for Images we wish to download
        String[] urls = {
                "http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png",
                "http://www.google.co.uk/images/srpr/logo3w.png",
                "http://i.microsoft.com/global/en-us/homepage/PublishingImages/sprites/microsoft_gray.png"
                };

        for(int i = 0; i < urls.length; i++) {
            downloadFromUrl(urls[i]);
        }

    }

    /*
    Extract the file name from the URL
    */
    private static String getOutputFileName(URL url) {

        String[] urlParts = url.getPath().split("/");

        return "c:/temp/" + urlParts[urlParts.length-1];
    }

    /*
    Assumes there is no Proxy server involved.
    */
    private static void downloadFromUrl(String urlString) {

        InputStream is = null;
        FileOutputStream fos = null; 

        try {
            URL url = new URL(urlString);

            System.out.println("Reading..." + url);

            HttpURLConnection conn = (HttpURLConnection)url.openConnection(proxy);

            is = conn.getInputStream(); 

            String filename = getOutputFileName(url);

            fos = new FileOutputStream(filename);

            byte[] readData = new byte[1024];

            int i = is.read(readData);

            while(i != -1) {
                fos.write(readData, 0, i);
                i = is.read(readData);
            }

            System.out.println("Created file: " + filename);
        }
        catch (MalformedURLException e) {
            e.printStackTrace();
        }
        catch (IOException e) {
            e.printStackTrace();
        }
        finally {
            if(is != null) {
                try {
                    is.close();
                } catch (IOException e) {
                    System.out.println("Big problems if InputStream cannot be closed");
                }
            }           
            if(fos != null) {
                try {
                    fos.close();
                } catch (IOException e) {
                    System.out.println("Big problems if FileOutputSream cannot be closed");
                }
            }
        }

        System.out.println("Completed");
    }
}

您应该在控制台上看到以下输出...

Reading...http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png
Created file: c:/temp/apple-touch-icon.png
Completed
Reading...http://www.google.co.uk/images/srpr/logo3w.png
Created file: c:/temp/logo3w.png
Completed
Reading...http://i.microsoft.com/global/en-us/homepage/PublishingImages/sprites/microsoft_gray.png
Created file: c:/temp/microsoft_gray.png
Completed

这是一个不涉及代理服务器的工作示例。

仅当您需要使用代理服务器进行身份验证时,您才需要基于此Oracle 技术说明的附加类

import java.net.Authenticator;
import java.net.PasswordAuthentication;

public class ProxyAuthenticator extends Authenticator {

    private String userName, password;

    public ProxyAuthenticator(String userName, String password) {
        this.userName = userName;
        this.password = password;
    }

    protected PasswordAuthentication getPasswordAuthentication() {
        return new PasswordAuthentication(userName, password.toCharArray());
    }
}

要使用这个新类,您将使用以下代码代替上面显示的对 openConnection() 的调用

...
try {
    URL url = new URL(urlString);

    System.out.println("Reading..." + url);

    Authenticator.setDefault(new ProxyAuthenticator("username", "password");

    SocketAddress addr = new InetSocketAddress("proxy.server.com", 80);
    Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);

    HttpURLConnection conn = (HttpURLConnection)url.openConnection(proxy);

    ...
于 2012-04-18T10:28:08.177 回答