Let's assume I have multiple clients sending requests to a server (gRPC service). I would like my server to be able to collect, let say 8 requests, process these requests at once, and then only send the result back to the clients. I'm not sure how to do this using GRPC functionalities, or even if it's possible or if I need something else.
context: my use case comes from serving a neural network which is on GPU. In this case, it's much more efficient to batch the input of multiple requests, do one inference, and send the result back rather than do one inference per input.