我有一个具有双向流方法的 gRPC 服务。
- 客户端:python grpcio 1.41.1。
- 服务器:akka-grpc 2.1.0。
客户端是一个缓慢的消费者(服务器可能以更高的速率执行)。
有时(在方法调用后会有一些随机延迟),客户端会记录如下消息:
E1122 13:42:55.763763501 108048 flow_control.cc:240] Incoming frame of size 317205 exceeds local window size of 0.
The (un-acked, future) window size would be 1708209 which is not exceeded.
This would usually cause a disconnection, but allowing it due tobroken HTTP2 implementations in the wild.
See (for example) https://github.com/netty/netty/issues/6520.
有时在此消息之后整体调用成功,但有时此消息后跟异常(客户端):
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "[...]/client.py", line 107, in fetch
for response in responses:
File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Stream removed"
debug_error_string = "{"created":"@1637649068.837642637","description":"Error received from peer ipv4:***.***.***.***:****","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
对应的服务器端日志条目:
2021-11-23 09:58:52.426 ERROR akka.actor.ActorSystemImpl - Unhandled error: [Stream with ID [1] was closed by peer with code INTERNAL_ERROR(0x02)].
akka.http.scaladsl.model.http2.PeerClosedStreamException: Stream with ID [1] was closed by peer with code INTERNAL_ERROR(0x02)
一些研究:
- 通过设置禁用 BDP
grpc.http2.bdp_probe = 0
似乎可以解决问题,但我认为这只是整体吞吐量下降的副作用。 - GitHub 上有类似的问题,但它看起来像是关于一元调用。在这种情况下,服务器在接收到客户端的 SETTINGS 帧后和发送 SETTINGS ack 之前立即开始使用增加的初始窗口大小(如果我理解正确的话)。
我在我的案例中看到了类似的情况:有时客户端在调用过程中发送新的 SETTINGS 而服务器没有确认它们(实际上,Wireshark 无法解析一些 HTTP/2 数据包并将它们显示为 TCP,因此这些确认可能只是被过滤掉了由 Wireshark 出):这种情况并不总是导致错误,有时这样的调用会成功。 - 探索客户端 gRPC 跟踪日志 (
GRPC_VERBOSITY=DEBUG
,GRPC_TRACE=flowctl
) 并没有给我任何见解。
我将非常感谢有关如何解决或诊断问题的任何想法。