我正在尝试让一个简单的 RNN 在 tensorflow 中工作,但我遇到了几个问题。
我现在要做的是简单地运行 RNN 的前向传递,并将 LSTM 作为其单元类型。
我已经抓取了一些新闻文章,并希望将它们输入 RNN。我已将由所有文章的串联组成的字符串拆分为字符并将字符映射为整数。然后我对这些整数进行了一次热编码。
data = [c for c in article]
chars = list(set(data))
idx_chars = {i:ch for i,ch in enumerate(chars)}
chars_idx = {ch:i for i,ch in enumerate(chars)}
int_data = [chars_idx[ch] for ch in data]
# config values
vocab_size = len(chars)
hidden_size = 100
seq_length = 25
# helper function to get one-hot encoding
def onehot(value):
result = np.zeros(vocab_size)
result[value] = 1
return result
def vectorize_input(inputs):
result = [onehot(x) for x in inputs]
return result
input = vectorize_input(int_data[:25])
现在是 tensorflow 代码。我想遍历数据中的所有字符并为每个前向传递使用 25 个字符。我的第一个问题是关于批量大小,如果我想按照我刚才提到的方式执行此操作,我的批量大小是 1,对吗?因此,与输入中的一个字符相对应的每个向量都具有 [1,vocab_size] 形状,并且我的输入中有 25 个这些向量。所以我使用了以下张量:
seq_input = tf.placeholder(tf.int32, shape = [seq_length, 1, vocab_size])
targets = tf.placeholder(tf.int32, shape = [seq_length, 1, vocab_size])
inputs = [tf.reshape(i,(1,vocab_size)) for i in tf.split(0,seq_length,seq_input)]
我必须创建最后一个张量,因为那是 rnn 函数所期望的格式。
然后我遇到了变量范围的问题,我收到以下错误:
cell = rnn_cell.BasicLSTMCell(hidden_size, input_size = vocab_size)
# note: first argument of zero_state is the batch_size
initial_state = cell.zero_state(1, tf.float32)
outputs, state = rnn.rnn(cell, inputs, initial_state= initial_state)
sess = tf.Session()
sess.run([outputs, state], feed_dict = {inputs:input})
ValueError Traceback (most recent call last)
<ipython-input-90-449af38c387d> in <module>()
7 # note: first argument of zero_state is supposed to be batch_size
8 initial_state = cell.zero_state(1, tf.float32)
----> 9 outputs, state = rnn.rnn(cell, inputs, initial_state= initial_state)
10
11 sess = tf.Session()
/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn.pyc in rnn(cell, inputs, initial_state, dtype, sequence_length, scope)
124 zero_output, state, call_cell)
125 else:
--> 126 (output, state) = call_cell()
127
128 outputs.append(output)
/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn.pyc in <lambda>()
117 if time > 0: vs.get_variable_scope().reuse_variables()
118 # pylint: disable=cell-var-from-loop
--> 119 call_cell = lambda: cell(input_, state)
120 # pylint: enable=cell-var-from-loop
121 if sequence_length:
/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn_cell.pyc in __call__(self, inputs, state, scope)
200 # Parameters of gates are concatenated into one multiply for efficiency.
201 c, h = array_ops.split(1, 2, state)
--> 202 concat = linear([inputs, h], 4 * self._num_units, True)
203
204 # i = input_gate, j = new_input, f = forget_gate, o = output_gate
/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn_cell.pyc in linear(args, output_size, bias, bias_start, scope)
700 # Now the computation.
701 with vs.variable_scope(scope or "Linear"):
--> 702 matrix = vs.get_variable("Matrix", [total_arg_size, output_size])
703 if len(args) == 1:
704 res = math_ops.matmul(args[0], matrix)
/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(name, shape, dtype, initializer, trainable, collections)
254 return get_variable_scope().get_variable(_get_default_variable_store(), name,
255 shape, dtype, initializer,
--> 256 trainable, collections)
257
258
/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, trainable, collections)
186 with ops.name_scope(None):
187 return var_store.get_variable(full_name, shape, dtype, initializer,
--> 188 self.reuse, trainable, collections)
189
190
/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, reuse, trainable, collections)
99 if should_check and not reuse:
100 raise ValueError("Over-sharing: Variable %s already exists, disallowed."
--> 101 " Did you mean to set reuse=True in VarScope?" % name)
102 found_var = self._vars[name]
103 if not shape.is_compatible_with(found_var.get_shape()):
ValueError: Over-sharing: Variable forward/RNN/BasicLSTMCell/Linear/Matrix already exists, disallowed. Did you mean to set reuse=True in VarScope?
而且我不确定为什么会出现这个错误,因为我实际上没有在我的代码中指定任何变量,这些变量只在 rnn 和 rnn_cell 函数中创建,有人可以告诉我如何解决这个错误吗?
我目前遇到的另一个错误是类型错误,因为我的输入是 tf.int32 类型,但在 LSTM 内部创建的隐藏层是 tf.float32 类型,而 rnn_cell.py 代码中的线性函数连接起来这两个张量并将它们乘以权重矩阵。为什么这是不可能的,我假设输入是单热编码并因此具有 int32 类型是相对常见的?
一般来说,这种方法在训练 char-rnns 时是否具有 1 个标准的批量大小?我只看过 Andrej Karpathy 的代码,他在基本的 numpy 中训练了一个 char-rnn,他使用相同的程序,他只是以长度为 25 的序列遍历整个文本。这里是代码:https:// gist.github.com/karpathy/d4dee566867f8291f086