google-cloud-tpu - 将 feed_dict tf.session 和 tf.graph 重写为估算器

Question

我有一些使用 feed_dict 编写到 tf.Session 和 tf.Graph 低级 API 的代码，因为我想在 TPU 上使用它，所以我试图将它重写为 tf.Estimator API

下面是当前版本的代码。（为简洁起见，删除了一些片段并标记为...）

class my_tpu_class(object):

def __init__(self, ):
    // ...code to initialize class members
    self.g = tf.Graph()
    self._buildGraph()
    self.session = tf.Session(graph = self.g)

def _buildGraph(self):
    with self.g.as_default():
        XPH = tf.placeholder(tf.float32, [None, self.inputShape[0], self.inputShape[1], self.inputShape[2]], name='XPH')
        self.XPH = XPH
        YPH = tf.placeholder(tf.float32, [None, self.outputShape1[0] + self.outputShape2[0] + self.outputShape3[0] + self.outputShape4[0]], name='YPH')
        self.YPH = YPH

        conv1 = tf.layers.conv2d(inputs=XPH,
                                 filters=self.numFeature1,
                                 activation=selu.selu,
                                 name='conv1')
        self.conv1 = conv1

       // ...rest of code to build the network and get the loss.

        loss1 = tf.reduce_sum(tf.pow(YBaseChangeSigmoid - tf.slice(YPH,[0,0],[-1,self.outputShape1[0]], name='YBaseChangeGetTruth'), 2, name='YBaseChangeMSE'), name='YBaseChangeReduceSum')

        loss = loss1 + other losses...
        self.loss = loss

        tf.summary.scalar("loss", loss)
        self.merged_summary_op = tf.summary.merge_all()

        self.training_op = tf.train.AdamOptimizer(learning_rate=learningRatePH).minimize(loss)
        self.init_op = tf.global_variables_initializer()

def init(self):
    self.session.run( self.init_op )

def close(self):
    self.session.close()

def train(self, batchX, batchY):
    loss, _, summary = self.session.run( (self.loss, self.training_op, self.merged_summary_op),
                                          feed_dict={self.XPH:batchX, self.YPH:batchY, self.learningRatePH:self.learningRateVal,
                                          self.phasePH:True, self.dropoutRatePH:self.dropoutRateVal})
    return loss, summary

我通读了大部分估计器和 tensorflow 文档，并能够使用估计器接口提出以下版本。

class my_tpu_class(object):

def __init__(self, ):
    //...code to initialize class members

def my_model_fn(self, XPH, YPH, mode, params): 
        conv1 = tf.layers.conv2d(inputs=XPH,
                                 filters=self.numFeature1,
                                 activation=selu.selu,
                                 name='conv1')
        self.conv1 = conv1

        // rest of code to build the network and get the loss....

        loss1 = tf.reduce_sum(tf.pow(YBaseChangeSigmoid - tf.slice(YPH,[0,0],[-1,self.outputShape1[0]], name='YBaseChangeGetTruth'), 2, name='YBaseChangeMSE'), name='YBaseChangeReduceSum')

        loss = loss1 + other losses....
        self.loss = loss

        tf.summary.scalar("loss", loss)
        self.merged_summary_op = tf.summary.merge_all()

        self.training_op = tf.train.AdamOptimizer(learning_rate=params['learningRatePH']).minimize(loss)
        return tf.estimator.EstimatorSpec(mode=mode, loss=self.loss, train_op=self.training_op, eval_metric_ops=self.merged_summary_op)

def init(self):
    print ("No op")

def close(self):
    self.session.close()

def train_input_fn(self, features, labels):
    dataset = tf.data.Dataset.from_tensor_slices((features, labels))
    return dataset.make_one_shot_iterator().get_next()

def train(self, batchX, batchY):
    my_tpu_estimator = tf.estimator.Estimator( model_fn=self.my_model_fn, 
                                               params= {'learningRatePH':self.learningRateVal, 'phasePH':True, 'dropoutRatePH':self.dropoutRateVal })
    my_tpu_estimator.train(input_fn=self.train_input_fn(batchX, batchY))

这是执行此操作的正确方法还是错误地理解了估算器概念？目前，该应用程序在 train 函数调用中崩溃。所以我想我有什么问题。

score 0 · Accepted Answer

您编写的代码不是 TPU 代码。它只是使用 Estimator API，它是一个高级 API。根据我的说法，它应该在 CPU 或 GPU 上工作，而不是在 TPU 上。对于 TPU，您应该使用 TPUEstimator API。

为了找出崩溃的确切原因，我想知道你是在 TPU 还是 CPU 上运行它。您也可以复制粘贴崩溃期间出现的控制台错误。

google-cloud-tpu - 将 feed_dict tf.session 和 tf.graph 重写为估算器

1 回答 1

Related

Reference