0

我正在下载模型https://huggingface.co/microsoft/Multilingual-MiniLM-L12-H384/tree/main microsoft/Multilingual-MiniLM-L12-H384 然后使用它。

变压器版本:'4.11.3'

我写了下面的代码:

import wandb
wandb.login()
%env WANDB_LOG_MODEL=true

model = tr.BertForSequenceClassification.from_pretrained("/home/pc/minilm_model",num_labels=2)
model.to(device)

print("hello")

training_args = tr.TrainingArguments(
report_to = 'wandb',
output_dir='/home/pc/proj/results2', # output directory
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=32, # batch size for evaluation
learning_rate=2e-5,
warmup_steps=1000, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=1000,
evaluation_strategy="epoch",
save_strategy="no"
)

print("hello")

trainer = tr.Trainer(
model=model, # the instantiated  Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_data, # training dataset
eval_dataset=val_data, # evaluation dataset
compute_metrics=compute_metrics
)

执行后:

模型卡在这一点上:

***** 跑步训练 *****

Num examples = 12981
 Num Epochs = 20
 Instantaneous batch size per device = 16
 Total train batch size (w. parallel, distributed & accumulation) = 32
 Gradient Accumulation steps = 1
 Total optimization steps = 8120
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

可能的解决方案是什么?

4

1 回答 1

1

我不知道为什么这会停止训练。

如果您在 HF 论坛上发帖,也许有人可以帮助您: https ://discuss.huggingface.co

我在 W&B 工作,所以如果您认为这与使用 W&B 相关,或者如果您有任何疑问,我可以在这里或我们的论坛上为您提供帮助。http://community.wandb.ai

于 2022-01-04T12:12:34.770 回答