我在使用 skflow 在张量板上显示损失(训练或监控)摘要时遇到问题
这是我的代码:
classifier = skflow.TensorFlowEstimator( model_fn=conv_model, n_classes=2, batch_size=BATCH_SIZE, steps=100000, learning_rate=0.001, config=RunConfig(gpu_memory_fraction=0.9))
val_monitor = monitors.ValidationMonitor(X_val, y_val, n_classes=2, print_steps=100)
classifier.fit(X_train, y_train, val_monitor, logdir='my_model_1/')
classifier.save('my_model_1/')
一切运行良好:
`I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:281: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
out.itemset((i, self.y[sample]), 1.0)
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:03:00.0
Total memory: 4.00GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:03:00.0)
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:370: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
out.itemset((i, y), 1.0)
Step #99, avg. train loss: 2.22587, avg. val loss: 2.14521
Step #199, avg. train loss: 0.82641, avg. val loss: 0.89103
Step #299, avg. train loss: 0.78344, avg. val loss: 0.85636
Step #399, avg. train loss: 0.76420, avg. val loss: 0.85675
Step #499, avg. train loss: 0.75868, avg. val loss: 0.84104
Step #599, avg. train loss: 0.75467, avg. val loss: 0.84945
Step #699, avg. train loss: 0.73990, avg. val loss: 0.91238
Step #799, avg. train loss: 0.73400, avg. val loss: 0.92720
Step #899, avg. train loss: 0.72879, avg. val loss: 0.91054
Step #999, avg. train loss: 0.73448, avg. val loss: 0.89823
Step #1099, avg. train loss: 0.70125, avg. val loss: 0.91640
Step #1199, avg. train loss: 0.71879, avg. val loss: 0.90597
Step #1299, avg. train loss: 0.70713, avg. val loss: 0.90736
Step #1399, avg. train loss: 0.70023, avg. val loss: 0.91414
Step #1499, avg. train loss: 0.69566, avg. val loss: 0.91007
Step #1599, avg. train loss: 0.68030, avg. val loss: 0.92729
Step #1699, avg. train loss: 0.68919, avg. val loss: 0.91168
Step #1799, avg. train loss: 0.67088, avg. val loss: 0.91744
Step #1899, avg. train loss: 0.68732, avg. val loss: 0.88844
Step #1999, avg. train loss: 0.67585, avg. val loss: 0.88854`
它生成大小为 4,8M 的文件 .tfevents(附加)
当我使用 chrome 作为资源管理器连接到机器时,我在图形/直方图/中有数据,但在事件中没有任何数据(未找到标量数据)
我错过了记录损失的东西吗?
注意:我
logging_ops.scalar_summary("model_loss", self._model_loss)
在 learn/python/learn/estimators/base.py 中添加了模型损失出现在 tensorboard
Ps:我在 GPU 机器上运行,使用最后构建的tensorflow 附加 tfevents my_model_1.zip