0

我一直在研究一个爬虫,我必须在某个特定的服务器上发出 1000 多个请求。到目前为止,它运行良好。但是现在异步任务没有完成。这是我的示例代码。


  private static CloseableHttpClient httpclient = HttpClients.createDefault();

  private String getContent(TaskUrl taskUrl, String hostname, int port,
      Map<String, String> basicHeaders) {
    String uri = taskUrl.getUrl();

    HttpHost proxyHost = new HttpHost(hostname, port);
    RequestConfig.Builder reqconfigconbuilder = RequestConfig.custom();
    // in case proxy are slow to fetch data timout can be increased to 7 sec. Any longer than that might make a negative impact
    reqconfigconbuilder.setConnectionRequestTimeout(5000);
    reqconfigconbuilder.setConnectTimeout(5000);
    reqconfigconbuilder.setSocketTimeout(5000);
    reqconfigconbuilder = reqconfigconbuilder.setProxy(proxyHost);
    RequestConfig config = reqconfigconbuilder.build();
    HttpGet httpget = new HttpGet(uri);
    if (basicHeaders != null) {
      for (Map.Entry<String, String> entry : basicHeaders.entrySet()) {
        httpget.addHeader(entry.getKey(), entry.getValue());
      }
    }
    List<String> userAgentList = CommonConstant.getCustomUserAgent();
    int in = StringUtils.getRandomIntegerBetweenRange(0, userAgentList.size() - 1);
    httpget.addHeader("User-Agent", userAgentList.get(in));
    httpget.setConfig(config);
    logger.debug("Now executing ");
    try (CloseableHttpResponse response = httpclient.execute(httpget)) {
      logger.info("Status code for url : {} {} with port : {} with host : {}", uri,
          response.getStatusLine().getStatusCode(), port, hostname);
      if (response.getStatusLine().getStatusCode() == 200) {
        return EntityUtils.toString(response.getEntity());
      } else if (response.getStatusLine().getStatusCode() == 404) {
        return "404";
      }
      return null;
    } catch (IOException e) {
      logger.error("Error for url : {} {}", uri, e.getMessage());
      return null;
    }

对于前 150 到 200 个 url,它工作正常。但过了一段时间,我可以看到日志卡在Now Executing之后什么都没有发生。有一段时间,该过程在 1 小时后才恢复。任何人都可以在这方面帮助我。我不知道为什么它会这样。它不应该在完成任务之前停止。任何帮助,将不胜感激。

4

0 回答 0