- 张量流GPU 1.10.0
- 张量流服务器 1.10.0
我已经部署了一个服务于多个模型的 tensorflow 服务器。客户端代码是这样client.py
的,我调用了预测函数。
channel = implementations.insecure_channel(host, port)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
request = predict_pb2.PredictRequest()
def predict(data, shape, model_name, signature_name="predict"):
request.model_spec.name = model_name
request.model_spec.signature_name = signature_name
request.inputs['image'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=shape))
result = stub.Predict(request, 10.0)
return result.outputs['prediction'].float_val[0]
我有大约 100 个具有相同配置的客户端。这是调用该predict
函数的示例代码:
from client import predict
while True:
print(predict(data, shape, model_name))
# time.sleep some while
起初,当我运行客户端代码时,我可以正确收到响应。但几个小时后,客户端因错误而崩溃
_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Socket closed)
我尝试将我的客户端代码修改为
def predict(data, shape, model_name, signature_name="predict"):
channel = implementations.insecure_channel(host, port)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = signature_name
request.inputs['image'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=shape))
result = stub.Predict(request, 10.0)
return result.outputs['prediction'].float_val[0]
这意味着每次predict
调用该函数时,我都会尝试与 tfs 服务器建立连接。但是这段代码也像以前一样失败了。
那么我应该如何处理这种情况呢?