tensorflow - 如何组合两个张量，使它们在一个数据集中？

Question

我正在使用Titanic来自 TensorFlow API 的数据集。

我不知道如何使特征张量模型友好。

这是我得到的最好的，但一次只针对一个张量。我如何使它可以处理特征项中的所有张量？

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam
    
data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)
    
for i in data.batch(1309):
    xx1 = i[0]['age']
    xx2 = i[0]['fare']
    yyy = tf.convert_to_tensor(tf.one_hot(i[1],2))

model = tf.keras.models.Sequential([tf.keras.layers.Dense(1),
tf.keras.layers.Dense(13, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')])

model.compile(
  optimizer=Adam(learning_rate=0.01), 
  loss='categorical_crossentropy', 
  metrics=['accuracy']
)

model.fit(xx1,yyy,epochs=30)

如何连接age和fare张量以使它们在一个数据集中？

我试过了concat，stack但无济于事。

score 2 · Accepted Answer

这应该可以通过使用 tf.stack 来完成。由于输入已经使用数据集 API，我重构了一些代码以利用数据集功能将输入格式映射到您描述的目标格式。为方便起见，这里是一个带有示例的 colab 笔记本：https ://colab.research.google.com/drive/1dHNe9rYaJSgqbj_QtQ1aJL_7WgKnLKsU?usp=sharing

# Nothing novel here
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam

data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)

预期数据重组的基本演示

使用 tf.stack 从数据集中取出 1 个项目并将其转换为包含两个目标数据点的张量

for item in data.take(1):
  age = item[0]['age']
  fare = item[0]['fare']
  output = tf.stack([age, fare], axis=0)
  print(output)

输出：tf.Tensor([30. 13.], shape=(2,), dtype=float32)

在输出中，我们可以看到一个张量，其中嵌入了预期的两个值。

用作 TensorFlow 数据集

TensorFlow 数据集可以直接提供用于训练，我们可以轻松创建一个函数，将输入数据格式映射到问题中描述的目标格式。下面的函数将使用上面的示例代码完成此操作。

# Input data and associated label
def transform_data(item, label):

  # Extract values
  age = item['age']
  fare = item['fare']

  # Create output tensor
  output = tf.stack([age, fare], axis=0)
  return output, label

# Create a training dataset from the base dataset - for each batch map the input format to the goal format by passing the mapping function 
train_dataset = data.map(transform_data).batch(1200)

# Model - I made some minor changes to get it to run cleaner
model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(13, activation='relu'),
  # As we have only two labels, this is really a binary problem, so I've created a single output neuron activated by sigmoid
  tf.keras.layers.Dense(1,activation='sigmoid')
])


# Compiled with binary_crossentropy to complement the binary classification
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset,epochs=30)

输出：

Epoch 1/30
2/2 [==============================] - 0s 16ms/step - loss: 11.7881 - accuracy: 0.4385
Epoch 2/30
2/2 [==============================] - 0s 7ms/step - loss: 10.2350 - accuracy: 0.4270
...

tensorflow - 如何组合两个张量，使它们在一个数据集中？

1 回答 1

预期数据重组的基本演示

用作 TensorFlow 数据集

Related

Reference