1

我正在使用带有 Google ML 引擎的 Tensorflow 进行预测。

为了创建进行预测,我们需要创建一个模型,对其进行训练并将其导出为 .pb 格式以及使用 SaveModel 的其他图元数据。我使用 sklearn (sckiit) 进行算法集成。所以最终模型是 tf 和 sklean 变量的组合

我在将 tf.placeholder / tf.varaible 值获取到 tf.Session() 中定义的变量时遇到了一个小问题。我的示例代码可以在这里找到:

#Note :  run the file  from root  of driver / directory as when we script try to save model in add_meta_graph_and_variables , it will give "ensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile" error if the path to folder is greater than 255 charac. in windows

#Note :  run the file  from root  of driver / directory as when we script try to save model in add_meta_graph_and_variables , it will give "ensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile" error if the path to folder is greater than 255 charac. in windows

#import data using pandas
#test with suicide random data set 

#hide warnings
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import os.path
import tensorflow as tf
import time

import pandas
from sklearn.cross_validation import train_test_split


#time in Unix timestamp 
ts = str(int(time.time()))

# Basic model parameters as external flags.
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('input_dir', 'input', 'Input Directory.')
flags.DEFINE_string('output_dir', 'output', 'Output Directory.')



def train_and_predict():

    export_dir = os.path.join(FLAGS.output_dir, 'svmdl_'+ts)
    builder = tf.saved_model.builder.SavedModelBuilder(export_dir) 

    prediction_graph = tf.Graph()
    with prediction_graph.as_default():

        #input data - read from CSV
        csv_file = os.path.join(FLAGS.input_dir, 'suicide_random.csv');
        csv_data = pandas.read_csv(csv_file)  

        #split data 
        Y, X = csv_data['suicide_pr'], csv_data[['q1', 'q2', 'q3']].fillna(0)
        #X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=35)
        X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

        #features columns count 
        features_col_count = len(X.columns)

        #algorithm
        from sklearn import tree
        algorithm = tree.DecisionTreeClassifier()

        #variables define 
        # input_feedback = tf.placeholder(tf.float32, shape=[None, features_col_count])
        #input_feedback = tf.placeholder_with_default([0,0,0],shape=[None, features_col_count])
        #input_feedback = tf.placeholder_with_default([[0,0,0]],shape=[None, features_col_count])
        #input_feedback = tf.Variable([[0,0,0]])
        input_feedback = tf.Variable([[0,0,0]], name="input_feedback", validate_shape=False)


        with tf.Session(graph=prediction_graph) as sess:

            # Add the variable initializer Op.
            tf.global_variables_initializer().run()

            #train model 
            algorithm.fit(X_train, Y_train)

            #assign values for input data [convert tensors to actual data types that can be use under sklearn operations]
            #_input_feedback  =   input_feedback # error -  need to get placeholder value instead tensor object
            #_input_feedback  =   [[0,0,0]] # error -  need to get placeholder value instead tensor object
            _input_feedback  =   input_feedback.eval() # get tensor variable assigned value

            #prediction
            prediction = algorithm.predict( _input_feedback)
            prediction_probability = algorithm.predict_proba(_input_feedback) # give probability measure for each label ->category / class 


            #convert to TF variables - output variables need to be compatible with google ML engine
            tf_prediction = tf.Variable(prediction , name="tf_prediction", validate_shape=False)
            tf_prediction_probability = tf.Variable(prediction_probability, name="tf_prediction_probability", validate_shape=False)
            tf_input_feedback = tf.Variable(_input_feedback, name="tf_input_feedback", validate_shape=False)

            #initialize newly defined tensors 
            tf.global_variables_initializer().run() 

            print("Input Feedback : \n {0}".format(_input_feedback)) 
            print("Prediction values: \n {0}".format(prediction))
            print("Prediction probability: \n {0}".format(prediction_probability))

            inputs_info = { 'input_feedback' : tf.saved_model.utils.build_tensor_info( input_feedback)}
            output_info = { 
                                    'sucide_probability' : tf.saved_model.utils.build_tensor_info(tf_prediction),
                                    'score' : tf.saved_model.utils.build_tensor_info(tf_prediction_probability),
                                    'input_feedback' : tf.saved_model.utils.build_tensor_info(tf_input_feedback)
                                }

            signature_def = tf.saved_model.signature_def_utils.build_signature_def(
                inputs=inputs_info,
                outputs=output_info,
                method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
            )


            #save model 
            builder.add_meta_graph_and_variables(sess, tags=[tf.saved_model.tag_constants.SERVING],
                                       signature_def_map= {
                                            tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY : signature_def
                                        })


            builder.save()  




def main(_):
    train_and_predict()

if __name__ == '__main__':
    tf.app.run()

对于上述保存的模型,google ml-engine 'predict' API 调用结果如下;

要求 :

Google ML Engine API - 预测 CMD 输入

响应: Google ML Engine API - 预测 CMD 输出

我的要求是我需要用“input_feedback”变量替换 Google-ML-engine -> predict API 调用输入。尽管我为 "input_feedback" 传递了不同的值,但它总是输出/获取默认值 "[[0,0,0]" 。

我尝试用 tf.placehodler 替换“input_feedback”,并尝试使用 feed_dict = {} 在会话内获取数据,但并没有走得太远。

非常感谢有关如何将 API 调用输入数据正确映射到“input_feedback”的任何反馈/帮助。

4

0 回答 0