让我们一一解答您的问题:
- 批量大小是一次完成训练/测试/验证的图像数量。您可以找到脚本中定义的相应参数及其默认值:
parser.add_argument(
'--train_batch_size',
type=int,
default=100,
help='How many images to train on at a time.'
)
parser.add_argument(
'--test_batch_size',
type=int,
default=-1,
help="""\
How many images to test on. This test set is only used once, to evaluate
the final accuracy of the model after training completes.
A value of -1 causes the entire test set to be used, which leads to more
stable results across runs.\
"""
)
parser.add_argument(
'--validation_batch_size',
type=int,
default=100,
help="""\
How many images to use in an evaluation batch. This validation set is
used much more often than the test set, and is an early indicator of how
accurate the model is during training.
A value of -1 causes the entire validation set to be used, which leads to
more stable results across training iterations, but may be slower on large
training sets.\
"""
)
因此,如果您想减少训练批量大小,您应该使用以下参数运行脚本:
python -m retrain --train_batch_size=16
我还建议您将批量大小的数量指定为 2 的幂(16、32、64、128,...)。这个数字取决于您使用的 GPU。GPU 的内存越少,您应该使用的批处理大小就越小。在 GPU 中使用 8Gb,您可以尝试 16 的批量大小。
- 要发现您是否在使用 GPU,您可以按照您提到的 Tensorflow 文档中的步骤进行操作- 只需
输入
tf.debugging.set_log_device_placement(True)
作为脚本的第一条语句。
设备放置日志记录会导致打印任何张量分配或操作。