0

我试图使用 slim 来制作我自己的数据集并读入它。当我试图读入它时,我收到以下错误:

raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_2_parallel_read/common_queue' is closed and has enough elements (requested 1, current size 0) [[Node: parallel_read/common_queue_Dequeue = QueueDequeueV2component_types=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] [[Node: case/If_2/DecodePng/_117 = _Recvclient_terminated=false , recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name=" edge_12_case/If_2/DecodePng", tensor_type=DT_UINT8, _device="/job:localhost/replica:0/task:0/gpu:0"]]

由操作 u'parallel_read/common_queue_Dequeue' 引起,定义在:

OutOfRangeError(参见上面的回溯):FIFOQueue '_2_parallel_read/common_queue' 已关闭且元素不足(请求 1,当前大小 0)[[节点:parallel_read/common_queue_Dequeue = QueueDequeueV2component_types=[DT_STRING,DT_STRING],timeout_ms=-1,_device ="/job:localhost/replica:0/task:0/cpu:0"]] [[节点:case/If_2/DecodePng/_117 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task :0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12_case/If_2/DecodePng", tensor_type=DT_UINT8, _device=" /job:localhost/replica:0/task:0/gpu:0"]]

正如我所想的那样,Fifo 队列似乎是空的(未填充)......

有人知道 slim 的哪个部分负责填充 FiFO 吗?这是我尝试的代码:

dataset_name = 'toto'
dataset_split_name = 'train'
dataset_dir = './dataset/'
num_readers = 1
batch_size = 2
output_padding = 5
dataset_id = 3
num_max_target = 5
num_preprocessing_threads = 1
num_epochs = None

with tf.Graph().as_default():
    dataset = dataset_factory.get_dataset(dataset_name, dataset_split_name, dataset_dir, dataset_id)
    provider = slim.dataset_data_provider.DatasetDataProvider(dataset, num_readers=num_readers, 
        common_queue_capacity=10*batch_size, common_queue_min=5*batch_size, num_epochs=num_epochs,shuffle=False)

    img = provider.get(['frame'])

    i = tf.train.shuffle_batch([tf.reshape(img, shape=[512, 512, 1])],
                batch_size=batch_size,
                num_threads=num_preprocessing_threads,
                capacity=2*batch_size,
                min_after_dequeue=batch_size)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(tf.local_variables_initializer())

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        img = sess.run(img)
        print('ok ok')

        coord.request_stop()
        coord.join(threads)
4

1 回答 1

0

我在文件中将分布式训练的功能从更改为更改,tf.distribute.Server()并且它起作用了。tf.train.Server()train.py

于 2020-05-04T16:41:36.753 回答