2017-07-07 14:21:28.793025: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-07 14:21:28.793037: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-07 14:21:28.793040: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-07 14:21:28.793042: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-07 14:21:28.793044: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-07 14:21:28.953864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Quadro M2000
major: 5 minor: 2 memoryClockRate (GHz) 1.1625
pciBusID 0000:01:00.0
Total memory: 3.93GiB
Free memory: 30.00MiB
2017-07-07 14:21:28.953885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-07 14:21:28.953890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-07-07 14:21:28.953896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M2000, pci bus id: 0000:01:00.0)
2017-07-07 14:21:28.957332: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 30.00M (31457280 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-07-07 14:21:39.936797: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 60.00MiB. Current allocation summary follows.
2017-07-07 14:21:39.936839: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936851: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936860: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936869: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936878: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936887: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936895: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936904: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936912: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936922: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936930: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936939: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936947: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936956: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936965: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936976: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 1, Chunks in use: 0 9.91MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936985: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.936996: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.937004: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.937013: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.937022: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-07-07 14:21:39.937031: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 60.00MiB was 32.00MiB, Chunk State:
2017-07-07 14:21:39.937040: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0000 of size 1280
2017-07-07 14:21:39.937047: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0500 of size 256
2017-07-07 14:21:39.937053: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0600 of size 256
2017-07-07 14:21:39.937059: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0700 of size 512
2017-07-07 14:21:39.937065: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0900 of size 256
2017-07-07 14:21:39.937071: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0a00 of size 256
2017-07-07 14:21:39.937076: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0b00 of size 1024
2017-07-07 14:21:39.937082: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c0f00 of size 256
2017-07-07 14:21:39.937088: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1000 of size 256
2017-07-07 14:21:39.937094: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1100 of size 1536
2017-07-07 14:21:39.937099: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1700 of size 256
2017-07-07 14:21:39.937105: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1800 of size 256
2017-07-07 14:21:39.937111: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1900 of size 1536
2017-07-07 14:21:39.937116: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c1f00 of size 256
2017-07-07 14:21:39.937122: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c2000 of size 256
2017-07-07 14:21:39.937127: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c2100 of size 1024
2017-07-07 14:21:39.937133: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c2500 of size 256
2017-07-07 14:21:39.937138: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c2600 of size 256
2017-07-07 14:21:39.937144: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c2700 of size 16384
2017-07-07 14:21:39.937150: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c6700 of size 256
2017-07-07 14:21:39.937155: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c6800 of size 256
2017-07-07 14:21:39.937161: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031c6900 of size 68096
2017-07-07 14:21:39.937167: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7800 of size 256
2017-07-07 14:21:39.937195: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7900 of size 256
2017-07-07 14:21:39.937201: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7a00 of size 256
2017-07-07 14:21:39.937206: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7b00 of size 256
2017-07-07 14:21:39.937212: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7c00 of size 256
2017-07-07 14:21:39.937217: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7d00 of size 256
2017-07-07 14:21:39.937223: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7e00 of size 256
2017-07-07 14:21:39.937228: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d7f00 of size 256
2017-07-07 14:21:39.937249: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8000 of size 256
2017-07-07 14:21:39.937253: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8100 of size 256
2017-07-07 14:21:39.937257: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8200 of size 256
2017-07-07 14:21:39.937261: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8300 of size 256
2017-07-07 14:21:39.937265: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8400 of size 256
2017-07-07 14:21:39.937268: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13031d8500 of size 256
2017-07-07 14:21:39.937272: I tensorflow/core/common_runtime
2017-07-07 14:21:39.937301: I tensorflow/core/common_runtime
2017-07-07 14:21:39.937310: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13037b3600 of size 5308416
2017-07-07 14:21:39.937314: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1303cc3600 of size 1536
2017-07-07 14:21:39.937318: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1303cc3c00 of size 3538944
2017-07-07 14:21:39.937322: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1304023c00 of size 1024
2017-07-07 14:21:39.937327: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1304024000 of size 10390784
2017-07-07 14:21:39.937331: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2017-07-07 14:21:39.937337: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 34 Chunks of size 256 totalling 8.5KiB
2017-07-07 14:21:39.937342: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 512 totalling 1.5KiB
2017-07-07 14:21:39.937347: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 1024 totalling 4.0KiB
2017-07-07 14:21:39.937353: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-07-07 14:21:39.937357: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 1536 totalling 6.0KiB
2017-07-07 14:21:39.937362: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 16384 totalling 16.0KiB
2017-07-07 14:21:39.937368: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 68096 totalling 66.5KiB
2017-07-07 14:21:39.937373: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB
2017-07-07 14:21:39.937378: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2457600 totalling 2.34MiB
2017-07-07 14:21:39.937383: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 3538944 totalling 6.75MiB
2017-07-07 14:21:39.937388: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 5308416 totalling 5.06MiB
2017-07-07 14:21:39.937393: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 14.39MiB
2017-07-07 14:21:39.937401: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 31457280
InUse: 15089664
MaxInUse: 15089664
NumAllocs: 53
MaxAllocSize: 5308416
2017-07-07 14:21:39.937412: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ************************************************************________________________________________
2017-07-07 14:21:39.937433: W tensorflow/core/framework/op_kernel.cc:1148] Resource exhausted: OOM when allocating tensor of shape [3840,4096] and type float
2017-07-07 14:21:39.955389: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [3840,4096] and type float
[[Node: coarse6/weights/Adam/Initializer/zeros = Const[_class=["loc:@coarse6/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [3840,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [3840,4096] and type float
[[Node: coarse6/weights/Adam/Initializer/zeros = Const[_class=["loc:@coarse6/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [3840,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/varun/Desktop/Depth_Estimation/task.py", line 145, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/varun/Desktop/Depth_Estimation/task.py", line 141, in main
File "/home/varun/Desktop/Depth_Estimation/task.py", line 58, in train
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [3840,4096] and type float
[[Node: coarse6/weights/Adam/Initializer/zeros = Const[_class=["loc:@coarse6/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [3840,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op 'coarse6/weights/Adam/Initializer/zeros', defined at:
File "/home/varun/Desktop/Depth_Estimation/task.py", line 145, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/varun/Desktop/Depth_Estimation/task.py", line 141, in main
File "/home/varun/Desktop/Depth_Estimation/task.py", line 37, in train
train_op = op.train(loss, global_step, BATCH_SIZE)
File "/home/varun/Desktop/Depth_Estimation/train_operation.py", line 36, in train
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/adam.py", line 128, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 725, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 200, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 278, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 701, in <lambda>
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/init_ops.py", line 93, in __call__
return array_ops.zeros(shape, dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1383, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 106, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [3840,4096] and type float
[[Node: coarse6/weights/Adam/Initializer/zeros = Const[_class=["loc:@coarse6/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [3840,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
由于资源分配错误,培训无法启动,我已检查了有关此问题的所有帖子。我该如何解决?我还尝试使用 TFFRCNN 中发布的 BFC 分配器 GPU 解决方案。我已将其添加到下面的培训代码中作为评论。还在这里检查了 Yaroslav Bulatov 提出的问题:https ://github.com/CharlesShang/TFFRCNN/issues/68
如果需要任何更改,任何人都可以帮助我修改代码吗?我也尝试减少批量大小,并尝试在 GPU 训练服务器上运行它。我无法修复它。
我正在使用来自https://github.com/MasazI/cnn_depth_tensorflow的代码 请查看上述链接中的 train_operation.py 文件。我只修改了task.py
我的培训代码:
from datetime import datetime
from tensorflow.python.platform import gfile
import numpy as np
import tensorflow as tf
from dataset import DataSet
from dataset import output_predict
import model
import train_operation as op
MAX_STEPS = 10000000
LOG_DEVICE_PLACEMENT = False
BATCH_SIZE = 4
TRAIN_FILE = "train.csv"
COARSE_DIR = "coarse"
REFINE_DIR = "refine"
REFINE_TRAIN = True
FINE_TUNE = True
def train():
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable=False)
dataset = DataSet(BATCH_SIZE)
images, depths, invalid_depths = dataset.csv_inputs(TRAIN_FILE)
keep_conv = tf.placeholder(tf.float32)
keep_hidden = tf.placeholder(tf.float32)
if REFINE_TRAIN:
print("refine train.")
coarse = model.inference(images, keep_conv, trainable=False)
logits = model.inference_refine(images, coarse, keep_conv, keep_hidden)
else:
print("coarse train.")
logits = model.inference(images, keep_conv, keep_hidden)
loss = model.loss(logits, depths, invalid_depths)
train_op = op.train(loss, global_step, BATCH_SIZE)
init_op = tf.global_variables_initializer()
# Session
'''
BFC Allocator Method
# Without softplacement creepy errors
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.90
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
'''
'''
# Lengthy log
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
'''
sess = tf.Session(config=tf.ConfigProto(log_device_placement=LOG_DEVICE_PLACEMENT))
sess.run(init_op)
# parameters
coarse_params = {}
refine_params = {}
if REFINE_TRAIN:
for variable in tf.all_variables():
variable_name = variable.name.replace(':','__')
print("parameter: %s" % (variable_name))
if variable_name.find("/") < 0 or variable_name.count("/") != 1:
continue
if variable_name.find('coarse') >= 0:
coarse_params[variable_name] = variable
print("parameter: %s" %(variable_name))
if variable_name.find('fine') >= 0:
refine_params[variable_name] = variable
else:
for variable in tf.trainable_variables():
variable_name = variable.name.replace(':','__')
print("parameter: %s" %(variable_name))
if variable_name.find("/") < 0 or variable_name.count("/") != 1:
continue
if variable_name.find('coarse') >= 0:
coarse_params[variable_name] = variable
if variable_name.find('fine') >= 0:
refine_params[variable_name] = variable
# define saver
print (coarse_params)
saver_coarse = tf.train.Saver(coarse_params)
if REFINE_TRAIN:
saver_refine = tf.train.Saver(refine_params)
# fine tune
if FINE_TUNE:
coarse_ckpt = tf.train.get_checkpoint_state(COARSE_DIR)
if coarse_ckpt and coarse_ckpt.model_checkpoint_path:
print("Pretrained coarse Model Loading.")
saver_coarse.restore(sess, coarse_ckpt.model_checkpoint_path)
print("Pretrained coarse Model Restored.")
else:
print("No Pretrained coarse Model.")
if REFINE_TRAIN:
refine_ckpt = tf.train.get_checkpoint_state(REFINE_DIR)
if refine_ckpt and refine_ckpt.model_checkpoint_path:
print("Pretrained refine Model Loading.")
saver_refine.restore(sess, refine_ckpt.model_checkpoint_path)
print("Pretrained refine Model Restored.")
else:
print("No Pretrained refine Model.")
# train
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for step in range(MAX_STEPS):
index = 0
for i in range(1000):
_, loss_value, logits_val, images_val = sess.run([train_op, loss, logits, images], feed_dict={keep_conv: 0.8, keep_hidden: 0.5})
if index % 10 == 0:
print("%s: %d[epoch]: %d[iteration]: train loss %f" % (datetime.now(), step, index, loss_value))
assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
if index % 500 == 0:
if REFINE_TRAIN:
output_predict(logits_val, images_val, "data/predict_refine_%05d_%05d" % (step, i))
else:
output_predict(logits_val, images_val, "data/predict_%05d_%05d" % (step, i))
index += 1
if step % 5 == 0 or (step * 1) == MAX_STEPS:
if REFINE_TRAIN:
refine_checkpoint_path = REFINE_DIR + '/model.ckpt'
saver_refine.save(sess, refine_checkpoint_path, global_step=step)
else:
coarse_checkpoint_path = COARSE_DIR + '/model.ckpt'
saver_coarse.save(sess, coarse_checkpoint_path, global_step=step)
coord.request_stop()
coord.join(threads)
sess.close()