我正在尝试使用 Java 查找网页中所有损坏的链接。这是代码:
private static boolean isLive(String link){
HttpURLConnection urlconn = null;
int res = -1;
String msg = null;
try{
URL url = new URL(link);
urlconn = (HttpURLConnection)url.openConnection();
urlconn.setConnectTimeout(10000);
urlconn.setRequestMethod("GET");
urlconn.connect();
String redirlink = urlconn.getHeaderField("Location");
System.out.println(urlconn.getHeaderFields());
if(redirlink != null && !url.toExternalForm().equals(redirlink))
return isLive(redirlink);
else
return urlconn.getResponseCode()==HttpURLConnection.HTTP_OK;
}catch(Exception e){
System.out.println(e.getMessage());
return false;
}finally{
if(urlconn != null)
urlconn.disconnect();
}
}
public static void main(String[] s){
String link = "http://www.somefakesite.net";
System.out.println(isLive(link));
}
来自http://nscraps.com/Java/146-program-code-broken-link-checker.htm的代码。
此代码为所有网页(包括损坏的网页)提供 HTTP 200 状态。例如 http://www.somefakesite.net/给出以下标头字段:
{null=[HTTP/1.1 200 OK], Date=[Sun, 15 May 2011 18:51:29 GMT], Transfer-Encoding=[chunked], Keep-Alive=[timeout=4, max=100], Connection =[Keep-Alive], Content-Type=[text/html], Server=[Apache/2.2.15 (Win32) PHP/5.2.12], X-Powered-By=[PHP/5.2.9-1] }
即使这样的网站不存在,如何将其归类为断开的链接?