1

我想使用https://github.com/tensorflow/models/tree/master/research/slim中的 TF-Slim 实现在 CIFAR-10 上重现 NASNet 模型的结果以用于某些基准测试。为了从头开始训练这个模型,我train_image_classifier.py按照脚本注释(第 31-37 行)中的说明在原始代码中添加了以下几行/nets/nasnet/models.py

在第 247 行之后

elif FLAGS.learning_rate_decay_type == 'cosine':
    return tf.train.cosine_decay(FLAGS.learning_rate,
                                 global_step,
                                 decay_steps,
                                 name='cosine_decay_learning_rate')

在第 536 行之后

clone_gradients = tf.clip_by_global_norm(clones_gradients, 5.0)

下载 CIFAR-10 数据并将其转换为 TFRecord 格式后,我运行:

DATASET_DIR=/tmp/data/cifar10
TRAIN_DIR=/tmp/train_logs
python3 train_image_classifier.py \
      --train_dir=${TRAIN_DIR} \
      --dataset_name=cifar10 \
      --dataset_split_name=train \
      --dataset_dir=${DATASET_DIR} \
      --model_name=nasnet_cifar \
      --preprocessing_name=cifarnet  \
      --learning_rate=0.025 \
      --optimizer=momentum \
      --learning_rate_decay_type=cosine \
      --num_epochs_per_decay=600.0 \
      --batch_size=32

似乎即使在 600 个 epoch(= 937500 步)之后训练仍在继续,尽管由于余弦衰减,由于学习率在 600 个 epoch 后变为 0,因此参数没有更新。运行评估脚本:

DATASET_DIR=/tmp/data/cifar10
TRAIN_DIR=/tmp/train_logs
python3 eval_image_classifier.py \
      --alsologtostderr \
      --checkpoint_path=${TRAIN_DIR} \
      --dataset_name=cifar10 \
      --dataset_split_name=test \
      --dataset_dir=${DATASET_DIR} \
      --model_name=nasnet_cifar \
      --preprocessing_name=cifarnet

我得到以下结果:

/home/zelaa/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From eval_image_classifier.py:91: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
2018-02-24 19:22:39.646499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:02:00.0
totalMemory: 11.92GiB freeMemory: 11.81GiB
2018-02-24 19:22:39.646538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
WARNING:tensorflow:From eval_image_classifier.py:155: streaming_accuracy (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.metrics.accuracy. Note that the order of the labels and predictions arguments has been switched.
WARNING:tensorflow:From eval_image_classifier.py:157: streaming_recall_at_k (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed after 2016-11-08.
Instructions for updating:
Please use `streaming_sparse_recall_at_k`, and reshape labels from [batch_size] to [batch_size, 1].
INFO:tensorflow:Evaluating train_logs/model.ckpt-1002284
INFO:tensorflow:Starting evaluation at 2018-02-24-18:22:51
2018-02-24 19:22:52.383834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
INFO:tensorflow:Restoring parameters from train_logs/model.ckpt-1002284
INFO:tensorflow:Evaluation [20/200]
INFO:tensorflow:Evaluation [40/200]
INFO:tensorflow:Evaluation [60/200]
INFO:tensorflow:Evaluation [80/200]
INFO:tensorflow:Evaluation [100/200]
INFO:tensorflow:Evaluation [120/200]
INFO:tensorflow:Evaluation [140/200]
INFO:tensorflow:Evaluation [160/200]
INFO:tensorflow:Evaluation [180/200]
INFO:tensorflow:Evaluation [200/200]
eval/Recall_5[0.9985]
eval/Accuracy[0.9577]
INFO:tensorflow:Finished evaluation at 2018-02-24-18:23:26

因此,一次运行的测试误差为 4.23%,这与Learning Transferable Architectures for Scalable Image Recognition中提供的任何结果都不对应。这里有什么我遗漏的东西,阻止我匹配论文结果吗?

4

0 回答 0