我正在尝试使用 TFX 构建 ML 生产管道,目前正在使用 Trainer 模块。我需要在单独的文件中实现建模。
这是处理训练的管道的一部分:
trainer = Trainer(
module_file=module_file,
custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor),
transformed_examples=transform.outputs['transformed_examples'],
schema=schema_gen.outputs['schema'],
transform_graph=transform.outputs['transform_graph'],
train_args=trainer_pb2.TrainArgs(num_steps=10000),
eval_args=trainer_pb2.EvalArgs(num_steps=5000))
这是module_file的一部分,它定义了模型:
_DENSE_FLOAT_FEATURE_KEYS = ['number_of_likes', 'number_of_comments', 'owner_influence']
def _build_keras_model() -> tf.keras.Model:
inputs = [
keras.layers.Input(shape=(1,), name=_transformed_name(f))
for f in _DENSE_FLOAT_FEATURE_KEYS
]
d = keras.layers.concatenate(inputs)
for _ in range(int(4)):
d = keras.layers.Dense(8, activation='relu')(d)
output = keras.layers.Dense(1)(d)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['mean_absolute_error'])
model.summary(print_fn=absl.logging.info)
return model
另一个由 Trainer 调用的函数:
def run_fn(fn_args: TrainerFnArgs):
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(fn_args.train_files, tf_transform_output,
batch_size=_TRAIN_BATCH_SIZE)
eval_dataset = _input_fn(fn_args.eval_files, tf_transform_output,
batch_size=_EVAL_BATCH_SIZE)
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = _build_keras_model()
model.fit(
train_dataset,
steps_per_epoch=fn_args.train_steps,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps)
当我运行它时,我得到了这样的错误:
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("DeserializeSparse_3:0", shape=(None, 2), dtype=int64, device=/job:localhost/replica:0/task:0/device:CPU:0), values=Tensor("DeserializeSparse_3:1", shape=(None,), dtype=float32, device=/job:localhost/replica:0/task:0/device:CPU:0), dense_shape=Tensor("stack_3:0", shape=(2,), dtype=int64, device=/job:localhost/replica:0/task:0/device:CPU:0)). Consider casting elements to a supported type.[while running 'Run[Trainer]']
我从官方 TFX示例和指南中获取的整个逻辑和代码。在连接输入之后,数据以张量格式而不是 SparseTensor 格式进入模型层。我不知道那里发生了什么,并且网络中没有这样的情况:(
将非常感谢您的帮助!