0

我有一个具有双向流方法的 gRPC 服务。

  • 客户端:python grpcio 1.41.1。
  • 服务器:akka-grpc 2.1.0。

客户端是一个缓慢的消费者(服务器可能以更高的速率执行)。

有时(在方法调用后会有一些随机延迟),客户端会记录如下消息:

E1122 13:42:55.763763501  108048 flow_control.cc:240]        Incoming frame of size 317205 exceeds local window size of 0.
The (un-acked, future) window size would be 1708209 which is not exceeded.
This would usually cause a disconnection, but allowing it due tobroken HTTP2 implementations in the wild.
See (for example) https://github.com/netty/netty/issues/6520.

有时在此消息之后整体调用成功,但有时此消息后跟异常(客户端):

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "[...]/client.py", line 107, in fetch
    for response in responses:
  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "Stream removed"
    debug_error_string = "{"created":"@1637649068.837642637","description":"Error received from peer ipv4:***.***.***.***:****","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"

对应的服务器端日志条目:

2021-11-23 09:58:52.426 ERROR akka.actor.ActorSystemImpl - Unhandled error: [Stream with ID [1] was closed by peer with code INTERNAL_ERROR(0x02)].
akka.http.scaladsl.model.http2.PeerClosedStreamException: Stream with ID [1] was closed by peer with code INTERNAL_ERROR(0x02)

一些研究:

  • 通过设置禁用 BDPgrpc.http2.bdp_probe = 0似乎可以解决问题,但我认为这只是整体吞吐量下降的副作用。
  • GitHub 上有类似的问题,但它看起来像是关于一调用。在这种情况下,服务器在接收到客户端的 SETTINGS 帧后和发送 SETTINGS ack 之前立即开始使用增加的初始窗口大小(如果我理解正确的话)。
    我在我的案例中看到了类似的情况:有时客户端在调用过程中发送新的 SETTINGS 而服务器没有确认它们(实际上,Wireshark 无法解析一些 HTTP/2 数据包并将它们显示为 TCP,因此这些确认可能只是被过滤掉了由 Wireshark 出):捕获的网络数据包(设置)这种情况并不总是导致错误,有时这样的调用会成功。
  • 探索客户端 gRPC 跟踪日志 ( GRPC_VERBOSITY=DEBUG, GRPC_TRACE=flowctl) 并没有给我任何见解。

我将非常感谢有关如何解决或诊断问题的任何想法。

4

0 回答 0