0

我在本地和云端都成功地训练了我的https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census模型/实验。而且我能够在云中部署我的示例并运行预测。

但是,如果我想在本地运行我的预测——而不是在云端——我该怎么做呢?

我是新手,但我尝试了几种幼稚的方法,但都失败了,请参阅下面的 3 个具体方法。

欢迎任何提示或引用片段。

:-)

M。

** 原帖中关于方法 #1 的更新**

如果我包括单行;

c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)

我收到一个错误,请参阅下面的错误 #a。

如果我天真地编辑调用以包含缺少的参数,则构造函数可以工作,但是如果我调用 predict 失败并出现错误 #b,请参见下文。我将 model.py 中的 wide_columns 和 deep_columns 设为全局,并将上面的行修改为

c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir, linear_feature_columns=model.wide_columns, dnn_feature_columns=model.deep_columns)

我的 pycharm 调试器确认 model.wide_columns 和 model.deep_columns 在调用时已实例化/不为空。

现在这导致了一个“空”分类器。我不相信 DNNLinearCombinedClassifier 会从我的 job_dir 中获取任何模型内容。我会包括检查分类器的屏幕截图,同时在 model.py build_estimator() 中实例化(我也将它变成了一个变量 c,并且有一个断点)和 task.py 中的上述 c,但我由于我缺乏声誉,github 不允许 m。但区别很明显——例如,对于恢复的分类器,c->params->dnn_hidden_​​units 是空的,但使用原始分类器实例化 ([100,70,48,34])。

我为 job_dir(称为输出)包含一个 ls -R,请参见下面的 #c。

我为每次运行执行 rm -rf 输出,因此 job_dir 是干净的。

显然我在某个地方犯了错误,但由于缺乏洞察力,我无法看到在哪里。任何进一步的建议表示赞赏。

:-)

M。

---------------------- 控制台输出(更新) ---------- ----

一个。

Starting Census: Please lauch tensorboard to see results:
tensorboard --logdir=$MODEL_DIR
2017-05-30 12:14:10.570030: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:14:10.570042: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:14:10.570046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
  File "<..>/trainer/task.py", line 199, in <module>
    c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
    return func(*args, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 597, in __init__
    raise ValueError("Either linear_feature_columns or dnn_feature_columns "
ValueError: Either linear_feature_columns or dnn_feature_columns must be defined.

Process finished with exit code 1

湾。

Starting Census: Please lauch tensorboard to see results:
tensorboard --logdir=$MODEL_DIR
2017-05-30 12:31:47.967638: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:31:47.967650: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:31:47.967653: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
  File "<..>/repository/git/13cx/subject-matter/google-cloud/1705cloudml/170530local-save/trainer/task.py", line 206, in <module>
    p = c.predict(input_fn=eval2_input_fn)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
    return func(*args, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
    return func(*args, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 660, in predict
    as_iterable=as_iterable)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
    return func(*args, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 695, in predict_classes
    as_iterable=as_iterable)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 281, in new_func
    return func(*args, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 565, in predict
    as_iterable=as_iterable)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 857, in _infer_model
    infer_ops = self._get_predict_ops(features)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1188, in _get_predict_ops
    return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.INFER)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1103, in _call_model_fn
    model_fn_results = self._model_fn(features, labels, **kwargs)
  File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 201, in _dnn_linear_combined_model_fn
    "dnn_hidden_units must be defined when dnn_feature_columns is "
ValueError: dnn_hidden_units must be defined when dnn_feature_columns is specified.

Process finished with exit code 1

C。

$ ls -R output/
output/:
checkpoint                                     graph.pbtxt                          model.ckpt-2.data-00000-of-00001
eval                                           model.ckpt-1000.data-00000-of-00001  model.ckpt-2.index
events.out.tfevents.1496140978.yarc-mainlinux  model.ckpt-1000.index                model.ckpt-2.meta
export                                         model.ckpt-1000.meta

output/eval:
events.out.tfevents.1496140982.yarc-mainlinux  events.out.tfevents.1496140987.yarc-mainlinux

output/export:
Servo

output/export/Servo:
1496140989

output/export/Servo/1496140989:
saved_model.pb  variables

output/export/Servo/1496140989/variables:
variables.data-00000-of-00001  variables.index

----------** 原帖**----------

--------我尝试过的东西------------

请参阅底部的代码,参考 1、2、3..

  1. 使用指向模型存储位置的 model_dir 参数重新实例化 DNNLinearCombinedClassifier。计划是运行分类器的预测方法。我无法让分类器反映保存的模型。

  2. 通过 saver.restore() 恢复模型。这有效,但我不明白如何从那里开始。由于缺乏对张量流的洞察力,我猜。

  3. 产生一些用于方法 1 的测试数据。张量的评估永远不会退出。如何评估输入批次,以便将其视为矩阵?

--------- 随附代码 -----------------

(此代码只是附加到 trainer/task.py 的末尾)

  # last original line from task.py:
  learn_runner.run(generate_experiment_fn(**arguments), job_dir)

  # my stuff: 

  # 1. restore the classifier from model dir, fails
  # c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)

  # 2. restore model, works ok, but then how?
  sess = tf.Session()
  saver = tf.train.import_meta_graph('output/model.ckpt-1000.meta')
  saver.restore(sess, tf.train.latest_checkpoint('./output/'))
  sess.run(tf.global_variables_initializer())
  print("Sanity check, a variable instance {}".format(
      sess.run('dnn/input_from_feature_columns/education_embedding/weights/part_0:0')))
  sess.close()

  # 3. produce some test input (we're for simplicity reusing the eval set), apparently works, but an evaluation hangs forever
  eval2_input_fn = model.generate_input_fn(
      arguments['eval_files'],
      batch_size=arguments['eval_batch_size'],
      shuffle=False
  )

  # 3a. inspecting some input, the evaluation never ends.
  input = eval2_input_fn()
  print("input: {}".format(input))
  with tf.Session() as sess:
      evalinput = input[1].eval()
      print("evalinput: {}".format(evalinput))
  print("\nDone")
4

3 回答 3

2

最简单的方法是使用gcloud

gcloud ml-engine local predict --model-dir output/export/Servo/1496140989 \ 
  --json-instances ../test.json
于 2017-07-22T05:16:34.310 回答
0

您可以使用 Estimator 本身进行预测(尽管这对于生产使用来说不够快)。

需要注意两点:

  • 确保您的 model_dir 具有检查点,该检查点由训练过程保存。Predict将从检查点加载参数以真正预测某些内容。

  • 您需要使用与训练相同的设置来构建 Estimator。

执行此操作的最简单方法(鉴于 cloudml-samples 提供的示例)是

  1. 使用与训练过程相同的设置来构建实验
  2. 从实验中获取估计器(这确保估计器的构建方式与训练相同)
  3. 准备 input_fn 进行预测并调用predict

使用 Estimator 本身时,您需要使用本地 python,因为它无法利用谷歌云。

在以下示例中,我注释掉了 learn_runner.run 以禁用训练(假设您已将模型训练为将检查点保存到 job_dir 中),然后使用 numpy_input_fn 为predict.

  ## Commented out the learn_runner run to do predict.
  ## Now the code can only work with local python.
  # learn_runner.run(generate_experiment_fn(**arguments), job_dir)

  # Change the code to construct the Estimator with exactly the same setting as
  # distributed training (with Experiment) but take the Estimator out and call
  # the predict expliclity.
  experiment_fn = generate_experiment_fn(**arguments)
  experiment = experiment_fn(job_dir)
  print("Using estimator to predict")
  estimator = experiment.estimator

  # The data contains two items.    
  data = {
      'age': [42, 47],
      'workclass': ['Private', 'Private'],
      'education': ['Masters', 'Prof-school'],
      'education_num': [14, 15],
      'marital_status': ['Never-married', 'Married-civ-spouse'],
      'occupation': ['Adm-clerical', 'Prof-specialty'],
      'relationship': ['Not-in-family', 'Wife'],
      'race': ['White', 'White'],
      'gender': ['Male', 'Female'],
      'capital_gain': [0, 0],
      'capital_loss': [0, 1902],
      'hours_per_week': [42, 60],
      'native_country': ['United-States', 'Honduras'],
  }

  import numpy as np

  for k,v in data.items():
    # Convert each column to numpy array and make sure it has rank 2, which is
    # required by the DNNCombinedLinearClassifier.
    data[k] = np.expand_dims(np.array(v), -1)

  predict_input_fn = tf.contrib.learn.io.numpy_input_fn(
      x=data, shuffle=False, num_epochs=1)

  for predicted_item in estimator.predict(input_fn=predict_input_fn):
    print('Predication: {}'.format(predicted_item))
于 2017-06-01T18:24:15.733 回答
0

如果性能不是问题,您可以直接使用该predict函数(上面的#1):

c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)
eval2_input_fn = model.generate_input_fn(
      arguments['eval_files'],
      batch_size=arguments['eval_batch_size'],
      shuffle=False
)
c.predict(input_fn=eval2_input_fn)

或者您可以手动执行一些操作:

class Predictor(object):

  def __init__(self, export_dir):
    self._sess = tf.Session()
    # Load the SavedModel
    meta = tf.saved_model.loader.load(self._sess, ['serve'], export_dir)
    # Map input aliases to the actual tensor names in the graph.
    inputs = meta.signature_def['serving_default'].inputs
    self._input_dict = {alias: info.name for alias, info in inputs.iteritems()}
    # Get the output aliases and tensor names
    outputs = meta.signature_def['serving_default'].outputs
    output_dict = [(alias, info.name) for alias, info in outputs.iteritems()]
    self._out_aliases, self._fetches = zip(*output_dict)

  def predict(self, examples):
    """Perform prediction on a list of examples (dicts)"""
    # Convert the list of examples to a feed dict by converting the rows to columns
    # and changing the tensor aliases to actual tensor names.
    columns = self._columnarize(examples)
    feed_dict = {self._input_dict[name]: val for name, val in columns.iteritems()}
    # Perform the actual prediction.
    fetched = self._sess.run(self._fetches, feed_dict)
    # Convert the fetched data to friendlier row-based output whose keys are
    # the output names/aliases.
    output_dict = dict(zip(self._out_aliases, fetched))
    return self._rowify(output_dict)

  def _columnarize(self, examples):
    """Convert a list of dicts to a dict of lists."""
    columns = collections.defaultdict(list)
    for example in examples:
      for name, val in example.iteritems():
        columns[name].append(val)
    return columns

  def _rowify(self, output_dict):
    """Convert a dict of lists to a list of dicts."""
    rows = []
    row_values = zip(*output_dict.values())
    for row in row_values:
      # Convert the row data to a dict
      rows.append(dict(zip(output_dict.keys(), row)))
    return rows

# Be sure to set the last path element to the correct value.
export_dir = os.path.join(job_dir, 'export', 'Servo', '1496140989')
p = Predictor(export_dir)  

# Create an example. Note the space before strings due to the way
# the CSV file is parsed during training.
example = {'age': 42,
           'workclass': ' Private',
           'education': ' Masters',
           'education_num': 14,
           'marital_status': ' Never-married',
           'occupation': ' Adm-clerical',
           'relationship': ' Not-in-family',
           'race': ' White',
           'gender': ' Male',
           'capital_gain': 0,
           'capital_loss': 0,
           'hours_per_week': 42,
           'native_country': ' United-States'}
p.predict([example])

[{u'概率':数组([0.90454769,0.09545235],dtype=float32),u'logits':数组([-2.24880791],dtype=float32),u'classes':0,u'logistic':数组([0.09545235], dtype=float32)}]

挂起可能是因为您需要启动“队列跑步者”。

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  print(sess.run(...))

  coord.request_stop()
  coord.join(threads)

也就是说,在使用队列时打印输入有点棘手。

于 2017-05-24T06:39:47.930 回答