tensorflow - 如何在数据管道中获取当前的 global_step

Question

我正在尝试创建一个取决于global_step培训当前的过滤器，但我没有正确地这样做。

首先，我不能tf.train.get_or_create_global_step()在下面的代码中使用，因为它会抛出

ValueError: Variable global_step already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

这就是为什么我尝试使用tf.get_default_graph().get_name_scope()并在该上下文中获取范围的原因，我能够“获取”全局步骤：

def filter_examples(example):
    scope = tf.get_default_graph().get_name_scope()

    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        current_step = tf.train.get_or_create_global_step()

    subtokens_by_step = tf.floor(current_step / curriculum_step_update)
    max_subtokens = min_subtokens + curriculum_step_size * tf.cast(subtokens_by_step, dtype=tf.int32)

    return tf.size(example['targets']) <= max_subtokens


dataset = dataset.filter(filter_examples)

问题在于它似乎没有像我预期的那样工作。根据我的观察，current_step上面代码中的似乎一直为 0（我不知道，只是根据我的观察，我假设）。

唯一似乎有所作为，而且听起来很奇怪，就是重新开始训练。我认为，同样基于观察，在这种情况下，current_step将是此时培训的实际当前步骤。但是随着训练的继续，值本身不会更新。

如果有办法获取当前步骤的实际值并在我的过滤器中使用它，就像上面一样？

环境

张量流 1.12.1

score 0 · Accepted Answer

正如我们在评论中讨论的那样，拥有和更新您自己的计数器可能是使用global_step变量的替代方法。该counter变量可以更新如下：

op = tf.assign_add(counter, 1)
with tf.control_dependencies(op): 
    # Some operation here before which the counter should be updated

使用tf.control_dependencies允许将更新“附加”counter到计算图中的路径。然后，您可以在需要的任何地方使用该counter变量。

score -1 · Accepted Answer

如果在数据集中使用变量，则需要在tf 1.x.

iterator = tf.compat.v1.make_initializable_iterator(dataset)
init = iterator.initializer
tensors = iterator.get_next()

with tf.compat.v1.Session() as sess:
    for epoch in range(num_epochs):
        sess.run(init)
        for example in range(num_examples):
            tensor_vals = sess.run(tensors)

tensorflow - 如何在数据管道中获取当前的 global_step

环境

2 回答 2

Related

Reference