java - 无法获取响应码！指针？

Question

我正在尝试抓取 300,000 个 URL。但是，在中间的某个地方，当尝试从 URL 检索响应代码时，代码会挂起。我不确定发生了什么问题，因为正在建立连接，但之后问题就出现了。我已经按照建议修改了设置读取超时和请求属性的代码。但是，即使现在代码也无法获取响应代码！任何建议/指针将不胜感激。另外，有没有办法在某个时间段内ping一个网站，如果它没有响应，就继续下一个？

这是我修改后的代码片段：

URL url=null;

try
{
   Thread.sleep(8000);
}
catch (InterruptedException e1)
{
   e1.printStackTrace();
}

 try
{
   //urlToBeCrawled comes from the database
   url=new URL(urlToBeCrawled);
}
catch (MalformedURLException e)
{
   e.printStackTrace();
 //The code is in a loop,so the use of continue.I apologize for putting code in the catch block.
  continue;
}
 HttpURLConnection huc=null;
 try
{
   huc = (HttpURLConnection)url.openConnection();

}
catch (IOException e)
{
   e.printStackTrace();
}
 try
 {
    //Added the request property
   huc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
  huc.setRequestMethod("HEAD");

 }
 catch (ProtocolException e)
 {
    e.printStackTrace();
 }

 huc.setConnectTimeout(1000);
 try
 {
    huc.connect();

  }
 catch (IOException e)
 {

    e.printStackTrace();
    continue;
  }

 int responseCode=0;
 try
 {
   //Sets the read timeout
   huc.setReadTimeout(15000);
   //Code hangs here for some URL which is random in each run
   responseCode = huc.getResponseCode();

  }
 catch (IOException e)  
{
   huc.disconnect();

   e.printStackTrace();
   continue;
}
if (responseCode!=200)
{
   huc.disconnect();
   continue;
 }

score 0 · Accepted Answer

在调用打开连接的 url.openConnection() 之后，您正在 HttpURLConnection 上设置读取和连接超时。因此它们没有生效。为此，我可能会使用Jetty HttpClient而不是 Java URL 类。

回答你的第二点。是的，只需尝试在远程域名处打开与端口 80（或 URL 中指定的其他端口）的连接，您可以url.getHost()使用原始套接字从 URL（使用）中提取该连接。为此，我将使用Netty而不是 Java 套接字。

score 0 · Accepted Answer

它挂起是因为字节流中从未收到响应代码。您将需要查看 http 调试器并查看实际收到的内容（如果有的话）。但是，它似乎确实打开了与服务器的 TCP 连接。它可能不喜欢您的用户代理（可能没有设置为您认为的那样）或请求方法HEAD，或者它可能是带宽有限的服务器。您可以使用Socket类来打开一个连接并手动准备好字节以查看您正在/未收到什么。

附带说明一下，根据您想要做什么Socket，仅使用实际上并不是一个坏方法。听起来您正在编写一个 http 服务器检查器，在这种情况下，您将通过直接使用获得更多功能，因为您将能够设计出更好、更优化的技术（您正在使用大量的低级网络 io毕竟）。Socket

java - 无法获取响应码！指针？

2 回答 2

Related

Reference