java - 如何使用 Java 下载受保护的网页

Question

我们有一个任务，设计一个可以下载任何网页源代码的类。但是当我尝试测试我的代码并获取页面时http://anidb.net/perl-bin/animedb.pl?show=main- 没有任何工作。

像这样的标准代码失败：

import java.net.*;
import java.io.*;

public class URLReader {
    public static void main(String[] args) throws Exception {
        URL link = new URL("http://www.anidb.net/");
        BufferedReader in = new BufferedReader(
        new InputStreamReader(link.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

这是我得到的结果：

&#352;wq>&#178;"¦§5&#180;_&#239;__&#199;U&#186;=&#244;&#217;&#246;?k&#352;}~“bd`?l“&amp;#207;&#231;z&#162;&#199;&#234;&#245;>_"?j&#215;‰R“y}K&#184;\&#204;c_DL&#217;&#170;&#207;_
    –&amp;#243;Mm_&#188;_0”•&amp;#246;°&#203;C_a&#237;&#189;s&#238;¤&#236;&#193;S ‚&gt;dC0&#236;s_–y&#185;&#241;±&#207;&#221;&#220;A&#248;%&#200;_&#228;&#214;&#225;__&#230;©A@,4x„&amp;#352;¶_&#235;&#201;&#402;?

我已经尝试了一切：cookies、头文件，但似乎没有任何效果。如果您对我有一些提示，我将不胜感激。

score 5 · Accepted Answer

编写一个 http 客户端，您必须考虑 gzip 编码以及分块传输。最好使用库来下载网页。

尝试这样的事情： http ://code.google.com/p/google-http-java-client/

score 2 · Accepted Answer

您在问题中提到的网站似乎不尊重“接受”请求标头，也没有正确设置“内容编码”响应标头，我认为这是不正确的。

无论如何，您也可以使用java.util.zip.GZipInputStream以纯文本格式读取响应：

public static void main(String[] args) throws Exception
{
    URL link = new URL("http://www.anidb.net/");
    HttpURLConnection con = (HttpURLConnection) link.openConnection();

    GZIPInputStream in = new GZIPInputStream(con.getInputStream());
    byte[] b = new byte[1024];
    StringBuilder content = new StringBuilder();
    while (in.read(b) > 0)
    {
        content.append(new String(b));
    }
    System.out.println(content);
}

java - 如何使用 Java 下载受保护的网页

2 回答 2

Related

Reference