我正在执行词义消歧,并创建了我自己的前 300k 最常见英语单词的词汇表。我的模型非常简单,句子中的每个单词(它们各自的索引值)都通过嵌入层,该嵌入层嵌入单词并对结果嵌入进行平均。然后通过线性层发送平均嵌入,如下面的模型所示。
class TestingClassifier(nn.Module):
def __init__(self, vocabSize, features, embeddingDim):
super(TestingClassifier, self).__init__()
self.embeddings = nn.Embedding(vocabSize, embeddingDim)
self.linear = nn.Linear(features, 2)
self.sigmoid = nn.Sigmoid()
def forward(self, inputs):
embeds = self.embeddings(inputs)
avged = torch.mean(embeds, dim=-1)
output = self.linear(avged)
output = self.sigmoid(output)
return output
我将 BCELoss 作为损失函数,将 SGD 作为优化器。我的问题是,随着训练的进行,我的损失几乎没有减少,几乎就像它以非常高的损失收敛一样。我尝试了不同的学习率(0.0001、0.001、0.01 和 0.1),但我遇到了同样的问题。
我的训练功能如下:
def train_model(model,
optimizer,
lossFunction,
batchSize,
epochs,
isRnnModel,
trainDataLoader,
validDataLoader,
earlyStop = False,
maxPatience = 1
):
validationAcc = []
patienceCounter = 0
stopTraining = False
model.train()
# Train network
for epoch in range(epochs):
losses = []
if(stopTraining):
break
for inputs, labels in tqdm(trainDataLoader, position=0, leave=True):
optimizer.zero_grad()
# Predict and calculate loss
prediction = model(inputs)
loss = lossFunction(prediction, labels)
losses.append(loss)
# Backward propagation
loss.backward()
# Readjust weights
optimizer.step()
print(sum(losses) / len(losses))
curValidAcc = check_accuracy(validDataLoader, model, isRnnModel) # Check accuracy on validation set
curTrainAcc = check_accuracy(trainDataLoader, model, isRnnModel)
print("Epoch", epoch + 1, "Training accuracy", curTrainAcc, "Validation accuracy:", curValidAcc)
# Control early stopping
if(earlyStop):
if(patienceCounter == 0):
if(len(validationAcc) > 0 and curValidAcc < validationAcc[-1]):
benchmark = validationAcc[-1]
patienceCounter += 1
print("Patience counter", patienceCounter)
elif(patienceCounter == maxPatience):
print("EARLY STOP. Patience level:", patienceCounter)
stopTraining = True
else:
if(curValidAcc < benchmark):
patienceCounter += 1
print("Patience counter", patienceCounter)
else:
benchmark = curValidAcc
patienceCounter = 0
validationAcc.append(curValidAcc)
批量大小为 32(训练集包含 8000 行),词汇量为 300k,嵌入维度为 24。我尝试向网络添加更多线性层,但没有区别。即使经过多次训练,训练集和验证集的预测准确率也保持在 50% 左右(这太可怕了)。任何帮助深表感谢!