在测试期间,我发现某个特定网站在尝试检索它时返回了 HTTP 406 错误代码 - “不可接受”。网址是http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition。
这是我的代码(我正在尽我所能让它看起来像一个普通的浏览器请求):
sourceURL = new URL("http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition");
final HttpURLConnection connection = (HttpURLConnection) sourceURL.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");
connection.setRequestProperty("Accept-Charset", "ISO-8859-1,utf-8");
connection.setRequestProperty("Accept-Language", "en-US,en");
connection.setRequestProperty("Accept-Encoding", "gzip");
connection
.setRequestProperty("User-Agent",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21");
connection.setRequestProperty("Host",
sourceURL.getHost() + (sourceURL.getPort() != -1 ? ":" + sourceURL.getPort() : ""));
System.out.println("Response code: "+connection.getResponseCode());
为什么此 Web 服务器会返回此错误?显然 Web 服务器是 Apache 2.2.16。
编辑:当我注释掉这一行时,这似乎有效:
connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");
但为什么?