我正在尝试编写一个自定义层来处理可变长度向量,并将它们减少到相同长度的向量。长度是预先知道的,因为可变长度的原因是我有几种不同的数据类型,我使用不同数量的特征进行编码。从某种意义上说,它类似于Embedding only for numeric values。我尝试过使用填充,但结果很糟糕,所以我正在尝试这种方法。
因此,例如,假设我有 3 种数据类型,我使用 3、4、6 个长度向量对其进行编码。
arr = [
# example one (data type 1 [len()==3], datat type 3[len()==6]) - force values as floats
[[1.0,2.0,3],[1,2,3,4,5,6]],
# example two (data type 2 [len()==4], datat type 3len()==6]) - force values as floats
[[1.0,2,3,4],[1,2,3,4,5,6]],
]
我尝试实现一个自定义层,如:
class DimensionReducer(tf.keras.layers.Layer):
def __init__(self, output_dim, expected_lengths):
super(DimensionReducer, self).__init__()
self._supports_ragged_inputs = True
self.output_dim = output_dim
for l in expected_lengths:
setattr(self,f'w_{l}', self.add_weight(shape=(l, self.output_dim),initializer='random_normal',trainable=True))
setattr(self, f'b_{l}',self.add_weight(shape=(self.output_dim,), initializer='random_normal',trainable=True))
def call(self, inputs):
print(inputs.shape)
# batch
if len(inputs.shape) == 3:
print("batch")
result = []
for i,x in enumerate(inputs):
_result = []
for v in x:
l = len(v)
print(l)
print(v)
w = getattr(self, f'w_{l}')
b = getattr(self, f'b_{l}')
out = tf.matmul([v],w) + b
_result.append(out)
result.append(tf.concat(_result, 0))
r = tf.stack(result)
print("batch output:",r.shape)
return r
直接调用时似乎有效:
dim = DimensionReducer(3, [3,4,6])
dim(tf.ragged.constant(arr))
但是当我尝试将其合并到模型中时,它失败了:
import tensorflow as tf
val_ragged = tf.ragged.constant(arr)
inputs_ragged = tf.keras.layers.Input(shape=(None,None), ragged=True)
outputs_ragged = DimensionReducer(3, [3,4,6])(inputs_ragged)
model_ragged = tf.keras.Model(inputs=inputs_ragged, outputs=outputs_ragged)
# this one with RaggedTensor doesn't
print(model_ragged(val_ragged))
和
AttributeError: 'DimensionReducer' object has no attribute 'w_Tensor("dimension_reducer_98/strided_slice:0", shape=(), dtype=int32)'
我不确定如何实现这样的层,或者我做错了什么。