这应该可以通过使用 tf.stack 来完成。由于输入已经使用数据集 API,我重构了一些代码以利用数据集功能将输入格式映射到您描述的目标格式。为方便起见,这里是一个带有示例的 colab 笔记本:https ://colab.research.google.com/drive/1dHNe9rYaJSgqbj_QtQ1aJL_7WgKnLKsU?usp=sharing
# Nothing novel here
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam
data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)
预期数据重组的基本演示
使用 tf.stack 从数据集中取出 1 个项目并将其转换为包含两个目标数据点的张量
for item in data.take(1):
age = item[0]['age']
fare = item[0]['fare']
output = tf.stack([age, fare], axis=0)
print(output)
输出:tf.Tensor([30. 13.], shape=(2,), dtype=float32)
在输出中,我们可以看到一个张量,其中嵌入了预期的两个值。
用作 TensorFlow 数据集
TensorFlow 数据集可以直接提供用于训练,我们可以轻松创建一个函数,将输入数据格式映射到问题中描述的目标格式。下面的函数将使用上面的示例代码完成此操作。
# Input data and associated label
def transform_data(item, label):
# Extract values
age = item['age']
fare = item['fare']
# Create output tensor
output = tf.stack([age, fare], axis=0)
return output, label
# Create a training dataset from the base dataset - for each batch map the input format to the goal format by passing the mapping function
train_dataset = data.map(transform_data).batch(1200)
# Model - I made some minor changes to get it to run cleaner
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(2),
tf.keras.layers.Dense(13, activation='relu'),
# As we have only two labels, this is really a binary problem, so I've created a single output neuron activated by sigmoid
tf.keras.layers.Dense(1,activation='sigmoid')
])
# Compiled with binary_crossentropy to complement the binary classification
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset,epochs=30)
输出:
Epoch 1/30
2/2 [==============================] - 0s 16ms/step - loss: 11.7881 - accuracy: 0.4385
Epoch 2/30
2/2 [==============================] - 0s 7ms/step - loss: 10.2350 - accuracy: 0.4270
...