我正在使用带有 Google ML 引擎的 Tensorflow 进行预测。
为了创建进行预测,我们需要创建一个模型,对其进行训练并将其导出为 .pb 格式以及使用 SaveModel 的其他图元数据。我使用 sklearn (sckiit) 进行算法集成。所以最终模型是 tf 和 sklean 变量的组合
我在将 tf.placeholder / tf.varaible 值获取到 tf.Session() 中定义的变量时遇到了一个小问题。我的示例代码可以在这里找到:
#Note : run the file from root of driver / directory as when we script try to save model in add_meta_graph_and_variables , it will give "ensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile" error if the path to folder is greater than 255 charac. in windows
#Note : run the file from root of driver / directory as when we script try to save model in add_meta_graph_and_variables , it will give "ensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile" error if the path to folder is greater than 255 charac. in windows
#import data using pandas
#test with suicide random data set
#hide warnings
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import os.path
import tensorflow as tf
import time
import pandas
from sklearn.cross_validation import train_test_split
#time in Unix timestamp
ts = str(int(time.time()))
# Basic model parameters as external flags.
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('input_dir', 'input', 'Input Directory.')
flags.DEFINE_string('output_dir', 'output', 'Output Directory.')
def train_and_predict():
export_dir = os.path.join(FLAGS.output_dir, 'svmdl_'+ts)
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
prediction_graph = tf.Graph()
with prediction_graph.as_default():
#input data - read from CSV
csv_file = os.path.join(FLAGS.input_dir, 'suicide_random.csv');
csv_data = pandas.read_csv(csv_file)
#split data
Y, X = csv_data['suicide_pr'], csv_data[['q1', 'q2', 'q3']].fillna(0)
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=35)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
#features columns count
features_col_count = len(X.columns)
#algorithm
from sklearn import tree
algorithm = tree.DecisionTreeClassifier()
#variables define
# input_feedback = tf.placeholder(tf.float32, shape=[None, features_col_count])
#input_feedback = tf.placeholder_with_default([0,0,0],shape=[None, features_col_count])
#input_feedback = tf.placeholder_with_default([[0,0,0]],shape=[None, features_col_count])
#input_feedback = tf.Variable([[0,0,0]])
input_feedback = tf.Variable([[0,0,0]], name="input_feedback", validate_shape=False)
with tf.Session(graph=prediction_graph) as sess:
# Add the variable initializer Op.
tf.global_variables_initializer().run()
#train model
algorithm.fit(X_train, Y_train)
#assign values for input data [convert tensors to actual data types that can be use under sklearn operations]
#_input_feedback = input_feedback # error - need to get placeholder value instead tensor object
#_input_feedback = [[0,0,0]] # error - need to get placeholder value instead tensor object
_input_feedback = input_feedback.eval() # get tensor variable assigned value
#prediction
prediction = algorithm.predict( _input_feedback)
prediction_probability = algorithm.predict_proba(_input_feedback) # give probability measure for each label ->category / class
#convert to TF variables - output variables need to be compatible with google ML engine
tf_prediction = tf.Variable(prediction , name="tf_prediction", validate_shape=False)
tf_prediction_probability = tf.Variable(prediction_probability, name="tf_prediction_probability", validate_shape=False)
tf_input_feedback = tf.Variable(_input_feedback, name="tf_input_feedback", validate_shape=False)
#initialize newly defined tensors
tf.global_variables_initializer().run()
print("Input Feedback : \n {0}".format(_input_feedback))
print("Prediction values: \n {0}".format(prediction))
print("Prediction probability: \n {0}".format(prediction_probability))
inputs_info = { 'input_feedback' : tf.saved_model.utils.build_tensor_info( input_feedback)}
output_info = {
'sucide_probability' : tf.saved_model.utils.build_tensor_info(tf_prediction),
'score' : tf.saved_model.utils.build_tensor_info(tf_prediction_probability),
'input_feedback' : tf.saved_model.utils.build_tensor_info(tf_input_feedback)
}
signature_def = tf.saved_model.signature_def_utils.build_signature_def(
inputs=inputs_info,
outputs=output_info,
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
)
#save model
builder.add_meta_graph_and_variables(sess, tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map= {
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY : signature_def
})
builder.save()
def main(_):
train_and_predict()
if __name__ == '__main__':
tf.app.run()
对于上述保存的模型,google ml-engine 'predict' API 调用结果如下;
要求 :
Google ML Engine API - 预测 CMD 输入
响应: Google ML Engine API - 预测 CMD 输出
我的要求是我需要用“input_feedback”变量替换 Google-ML-engine -> predict API 调用输入。尽管我为 "input_feedback" 传递了不同的值,但它总是输出/获取默认值 "[[0,0,0]" 。
我尝试用 tf.placehodler 替换“input_feedback”,并尝试使用 feed_dict = {} 在会话内获取数据,但并没有走得太远。
非常感谢有关如何将 API 调用输入数据正确映射到“input_feedback”的任何反馈/帮助。