gpu - gRPC - Accumulate requests from Multiple clients

Question

Let's assume I have multiple clients sending requests to a server (gRPC service). I would like my server to be able to collect, let say 8 requests, process these requests at once, and then only send the result back to the clients. I'm not sure how to do this using GRPC functionalities, or even if it's possible or if I need something else.

context: my use case comes from serving a neural network which is on GPU. In this case, it's much more efficient to batch the input of multiple requests, do one inference, and send the result back rather than do one inference per input.

score 1 · Accepted Answer

至少3个选项。这里按照复杂度递增的顺序：

客户端使用他们的数据调用服务器。服务器以批号响应。客户然后使用批号做出“完成了吗？” 针对服务器的 RPC。最简单的方法但是使用轮询并且更浪费。
客户端使用他们的数据调用服务器。服务器响应消息流，更新客户端批次的状态......工作，工作，工作，完成[结果]。优点是下面#3 中明确的隐式“回调”。如果您不太关心中间状态，则缺点是流的冗余。
客户端使用他们的数据和回调地址调用服务器。服务器（作为 gRPC 客户端）使用回调在客户端（作为 gRPC 服务器运行）上进行 RPC。鉴于#1 和#2，最复杂且可能不必要。

gpu - gRPC - Accumulate requests from Multiple clients

1 回答 1

Related

Reference