tensorflow2.0 - 对自定义数据集上的多类分类进行微调的拥抱脸 DistilBERT 在预测时产生奇怪的输出形状

Question

我正在尝试按照https://huggingface.co/transformers/custom_datasets.html上的教程在自定义数据集上微调 Huggingface 的 distilbert 实现，以在自定义数据集上进行多类分类（100 个类）。

我正在使用 Tensorflow 这样做，并在原生 tensorflow 中进行微调，也就是说，我使用教程的以下部分来创建数据集：

import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
    dict(val_encodings),
    val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    test_labels
))

而这个用于微调：

from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

微调似乎一切都很好，但是当我尝试使用model.predict(test_dataset)作为参数（带有 2000 个示例）对测试数据集进行预测时，该模型似乎对每个标记产生一个预测，而不是每个序列产生一个预测......

也就是说(1, 2000, 100)，我得到的不是 shape 的输出，而是 shape 的输出(1, 1024000, 100)，其中 1024000 是测试示例的数量 (2000) * 序列长度 (512)。

关于这里发生了什么的任何提示？（对不起，如果这很天真，我对 tensorflow 很陌生）。

score 1 · Accepted Answer

我有完全相同的问题。我不知道它为什么会发生，因为它应该通过查看教程通过正确的代码。

但对我来说，它可以从 train_encodings 创建 numpy 数组并将它们直接传递给 fit 方法，而不是创建数据集。

x1 = np.array(list(dict(train_encodings).values()))[0]
x2 = np.array(list(dict(train_encodings).values()))[1]
model.fit([x1,x2], train_labels, epochs=20)

tensorflow2.0 - 对自定义数据集上的多类分类进行微调的拥抱脸 DistilBERT 在预测时产生奇怪的输出形状

1 回答 1

Related

Reference