0

在测试期间,我发现某个特定网站在尝试检索它时返回了 HTTP 406 错误代码 - “不可接受”。网址是http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition

这是我的代码(我正在尽我所能让它看起来像一个普通的浏览器请求):

    sourceURL = new URL("http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition");

    final HttpURLConnection connection = (HttpURLConnection) sourceURL.openConnection();
    connection.setDoInput(true);
    connection.setDoOutput(true);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");
    connection.setRequestProperty("Accept-Charset", "ISO-8859-1,utf-8");
    connection.setRequestProperty("Accept-Language", "en-US,en");
    connection.setRequestProperty("Accept-Encoding", "gzip");
    connection
        .setRequestProperty("User-Agent",
                "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21");
    connection.setRequestProperty("Host",
            sourceURL.getHost() + (sourceURL.getPort() != -1 ? ":" + sourceURL.getPort() : ""));

    System.out.println("Response code: "+connection.getResponseCode());

为什么此 Web 服务器会返回此错误?显然 Web 服务器是 Apache 2.2.16。

编辑:当我注释掉这一行时,这似乎有效:

    connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");

但为什么?

4

0 回答 0