0

我正在尝试将我的数据集转换为 TFRecord 格式。我创建了一个包含句子的文本文件,每个句子都在一行中:

DATA_DIR = 'E:/'
sentences_file = os.path.join(DATA_DIR, 'data.txt')

我创建了另一个包含令牌的文本文件,每个都在一行中:

vocab_file = os.path.join(DATA_DIR, 'tokens.txt')

我想将这些数据转换为 TFRecords 数据集:

import tensorflow as tf

import os
from tensorflow.python.ops import lookup_ops
#lookup table, converts a token to integer. By default returns token at first line of `tokens.txt`
#Requires to be initialized using tf.tables_initializer inside a session.
vocab_table = lookup_ops.index_table_from_file(vocab_file, default_value=0)

#Creates a dataset which retruns a single sentence
dataset = tf.data.TextLineDataset(sentences_file)

#Converts each sentence to a list of tokens
dataset = dataset.map(lambda sentence: tf.string_split([sentence]).values)

#Converts list of tokens to list of token integers
dataset = dataset.map(lambda words: vocab_table.lookup(words))

#Adds length of sentence (number of tokens)
dataset = dataset.map(lambda words: (words, tf.size(words)))

#Convert to a batch of size 32. Padded batch appends 0 for shorter sentences.
dataset = dataset.padded_batch(batch_size=32, padded_shapes=(tf.TensorShape([None]), tf.TensorShape([])))


# Dataset iterator. Needs to be initialized
iterator = dataset.make_initializable_iterator()

但是,我收到以下错误:

C:\ProgramData\Anaconda3\python.exe "E:/untitled1/dfgd.py"
2021-02-17 09:55:33.833498: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:33.833695: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-17 09:55:36.692747: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-17 09:55:36.693923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-02-17 09:55:36.708389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce 210 computeCapability: 1.2
coreClock: 1.402GHz coreCount: 1 deviceMemorySize: 1.00GiB deviceMemoryBandwidth: 7.45GiB/s
2021-02-17 09:55:36.709175: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:36.709739: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-02-17 09:55:36.710395: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2021-02-17 09:55:36.710948: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-02-17 09:55:36.711486: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-02-17 09:55:36.712111: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-02-17 09:55:36.712650: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2021-02-17 09:55:36.713184: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-02-17 09:55:36.713351: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-17 09:55:36.714741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-17 09:55:36.714937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-02-17 09:55:36.715040: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-17 09:56:01.485573: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at lookup_table_init_op.cc:144 : Failed precondition: HashTable has different value for same key. Key method has 0 and trying to add value 6
Traceback (most recent call last):
  File "E:/untitled1/dfgd.py", line 13, in <module>
    vocab_table = lookup_ops.index_table_from_file(vocab_file, default_value=0)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py", line 1452, in index_table_from_file
    table = StaticHashTableV1(init, default_value)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py", line 314, in __init__
    super(StaticHashTable, self).__init__(default_value, initializer)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py", line 185, in __init__
    self._init_op = self._initialize()
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py", line 188, in _initialize
    return self._initializer.initialize(self)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py", line 744, in initialize
    -1 if self._vocab_size is None else self._vocab_size, self._delimiter)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\gen_lookup_ops.py", line 362, in initialize_table_from_text_file_v2
    _ops.raise_from_not_ok_status(e, name)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\ops.py", line 6862, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError: HashTable has different value for same key. Key method has 0 and trying to add value 6 [Op:InitializeTableFromTextFileV2]

我怎么解决这个问题?

4

1 回答 1

0

将文本转换为 Tfrecord 的工作示例代码片段

import tensorflow as tf
sentence_list = tf.train.BytesList(value=[b'sentence1', b'sentence2'])
token_list = tf.train.FloatList(value=[1.0, 2.0])

sentences = tf.train.Feature(bytes_list=sentence_list)
tokens = tf.train.Feature(float_list=token_list)

sentence_dict = {
  'sentence': sentences,
  'Token': tokens
}
feature_sentence = tf.train.Features(feature=sentence_dict)

example = tf.train.Example(features=feature_sentence)

with tf.io.TFRecordWriter('sentences.tfrecord') as writer:
  writer.write(example.SerializeToString())
于 2021-03-02T11:04:14.207 回答