1

我正在尝试使用 Akka HTTP 和 Akka Streams 来运行刮板。我从一堆索引页面开始,从中解析链接,然后获取每个链接并解析该页面,以返回一堆单独的链接。所以,像这样:

fetch-top-level-page -> list-of-links-to-child-pages -> fetch-child-page -> list-of-links-in-child-page

我的问题是我什至无法获取单个页面。我尝试获取的每个顶级 URL 都会导致一个死信,并且没有任何东西可以让它更进一步。

在这个示例代码中,我要做的就是发送HttpRequest到池中以转换为HttpResponse,并通过将内容打印到屏幕来证明它有效。

implicit val system = ActorSystem("scraper")
implicit val ec = system.dispatcher
implicit val settings = system.settings
implicit val materializer = ActorMaterializer()    

val requests = List(HttpRequest(...), HttpRequest(...))

val poolClientFlow = Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10))

  Source(requests)
    .map (req => { println("-", req); req}) // this part runs fine
    .via(poolClientFlow)
    .map(resp => {println("|", resp); resp}) // this never runs
    .toMat(Sink.foreach { p => println(p) })(Keep.both)
    .run()

这是我得到的:

(-,(HttpRequest(...),Future(<not completed>)))
(-,(HttpRequest(...),Future(<not completed>)))
[INFO] [03/07/2018 15:10:03.681] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/07/2018 15:10:03.683] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

我是 Akka 的新手,显然犯了一些基本错误,因为这似乎是 Akka、Akka Streams 和 Akka HTTP 构建的确切用例。

有任何想法吗?

4

1 回答 1

1

无法使用 Akka HTTP 10.1.0-RC2 和 Akka Streams 2.5.11 重现。以下作品:

val requests = List((HttpRequest(uri = "http://akka.io"), Promise[HttpResponse]()),
                    (HttpRequest(uri = "http://www.yahoo.com"), Promise[HttpResponse]()))

val poolClientFlow =
  Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10)

Source(requests)
  .map { req => println("-", req); req }
  .via(poolClientFlow)
  .map { resp => println("|", resp); resp }
  .toMat(Sink.foreach(println))(Keep.both)
  .run()

// The following is printed:

// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future())
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future())
// [WARN] [03/08/2018 14:46:31.003] [scraper-akka.actor.default-dispatcher-4] [scraper/Pool(shared->http://akka.io:80)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call `discardBytes()` on it. GET / Empty -> 301 Moved Permanently Chunked

可能更好的方法是这样的(注意对 的调用discardEntityBytes()):

Source(requests)
  .map { req => println("-", req); req }
  .via(poolClientFlow)
  .map { resp => println("|", resp); resp }
  .toMat(Sink.foreach({
    case ((util.Success(resp), p)) =>
      resp.discardEntityBytes()
      p.success(resp)
    case ((util.Failure(e), p)) => p.failure(e)
  }))(Keep.both)
  .run()

// The following is printed:

// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp21.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:38:32 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f860bca84a32ba6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
于 2018-03-08T14:48:49.823 回答