我已经创建了模型并使用 google colab 保存了权重。现在我创建了一个预测脚本。预测脚本包含模型类。我正在尝试使用以下方法加载模型权重-
节省 GPU,加载 CPU 节省:
torch.save(model.state_dict(), PATH)
加载:
device = torch.device('cpu')
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=device))
上面的方法应该可行吧?是的。
但是当我尝试这样做时,我在 Google Colab 中有不同的模型参数(预测,运行时-无,设备 = CPU),而在我的本地机器中则不同(预测,设备 = cpu)
Colab 中的模型参数-
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
该模型有 12,490,234 个可训练参数
+-------------------------------------------------------+------------+
| Modules | Parameters |
+-------------------------------------------------------+------------+
| encoder.tok_embedding.weight | 2053376 |
| encoder.pos_embedding.weight | 25600 |
| encoder.layers.0.self_attn_layer_norm.weight | 256 |
| encoder.layers.0.self_attn_layer_norm.bias | 256 |
| encoder.layers.0.ff_layer_norm.weight | 256 |
| encoder.layers.0.ff_layer_norm.bias | 256 |
| encoder.layers.0.self_attention.fc_q.weight | 65536 |
| encoder.layers.0.self_attention.fc_q.bias | 256 |
| encoder.layers.0.self_attention.fc_k.weight | 65536 |
| encoder.layers.0.self_attention.fc_k.bias | 256 |
| encoder.layers.0.self_attention.fc_v.weight | 65536 |
| encoder.layers.0.self_attention.fc_v.bias | 256 |
| encoder.layers.0.self_attention.fc_o.weight | 65536 |
| encoder.layers.0.self_attention.fc_o.bias | 256 |
| encoder.layers.0.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.0.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.0.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.0.positionwise_feedforward.fc_2.bias | 256 |
| encoder.layers.1.self_attn_layer_norm.weight | 256 |
| encoder.layers.1.self_attn_layer_norm.bias | 256 |
| encoder.layers.1.ff_layer_norm.weight | 256 |
| encoder.layers.1.ff_layer_norm.bias | 256 |
| encoder.layers.1.self_attention.fc_q.weight | 65536 |
| encoder.layers.1.self_attention.fc_q.bias | 256 |
| encoder.layers.1.self_attention.fc_k.weight | 65536 |
| encoder.layers.1.self_attention.fc_k.bias | 256 |
| encoder.layers.1.self_attention.fc_v.weight | 65536 |
| encoder.layers.1.self_attention.fc_v.bias | 256 |
| encoder.layers.1.self_attention.fc_o.weight | 65536 |
| encoder.layers.1.self_attention.fc_o.bias | 256 |
| encoder.layers.1.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.1.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.1.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.1.positionwise_feedforward.fc_2.bias | 256 |
| encoder.layers.2.self_attn_layer_norm.weight | 256 |
| encoder.layers.2.self_attn_layer_norm.bias | 256 |
| encoder.layers.2.ff_layer_norm.weight | 256 |
| encoder.layers.2.ff_layer_norm.bias | 256 |
| encoder.layers.2.self_attention.fc_q.weight | 65536 |
| encoder.layers.2.self_attention.fc_q.bias | 256 |
| encoder.layers.2.self_attention.fc_k.weight | 65536 |
| encoder.layers.2.self_attention.fc_k.bias | 256 |
| encoder.layers.2.self_attention.fc_v.weight | 65536 |
| encoder.layers.2.self_attention.fc_v.bias | 256 |
| encoder.layers.2.self_attention.fc_o.weight | 65536 |
| encoder.layers.2.self_attention.fc_o.bias | 256 |
| encoder.layers.2.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.2.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.2.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.2.positionwise_feedforward.fc_2.bias | 256 |
| decoder.tok_embedding.weight | 3209728 |
| decoder.pos_embedding.weight | 25600 |
| decoder.layers.0.self_attn_layer_norm.weight | 256 |
| decoder.layers.0.self_attn_layer_norm.bias | 256 |
| decoder.layers.0.enc_attn_layer_norm.weight | 256 |
| decoder.layers.0.enc_attn_layer_norm.bias | 256 |
| decoder.layers.0.ff_layer_norm.weight | 256 |
| decoder.layers.0.ff_layer_norm.bias | 256 |
| decoder.layers.0.self_attention.fc_q.weight | 65536 |
| decoder.layers.0.self_attention.fc_q.bias | 256 |
| decoder.layers.0.self_attention.fc_k.weight | 65536 |
| decoder.layers.0.self_attention.fc_k.bias | 256 |
| decoder.layers.0.self_attention.fc_v.weight | 65536 |
| decoder.layers.0.self_attention.fc_v.bias | 256 |
| decoder.layers.0.self_attention.fc_o.weight | 65536 |
| decoder.layers.0.self_attention.fc_o.bias | 256 |
| decoder.layers.0.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_q.bias | 256 |
| decoder.layers.0.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_k.bias | 256 |
| decoder.layers.0.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_v.bias | 256 |
| decoder.layers.0.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_o.bias | 256 |
| decoder.layers.0.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.0.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.0.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.0.positionwise_feedforward.fc_2.bias | 256 |
| decoder.layers.1.self_attn_layer_norm.weight | 256 |
| decoder.layers.1.self_attn_layer_norm.bias | 256 |
| decoder.layers.1.enc_attn_layer_norm.weight | 256 |
| decoder.layers.1.enc_attn_layer_norm.bias | 256 |
| decoder.layers.1.ff_layer_norm.weight | 256 |
| decoder.layers.1.ff_layer_norm.bias | 256 |
| decoder.layers.1.self_attention.fc_q.weight | 65536 |
| decoder.layers.1.self_attention.fc_q.bias | 256 |
| decoder.layers.1.self_attention.fc_k.weight | 65536 |
| decoder.layers.1.self_attention.fc_k.bias | 256 |
| decoder.layers.1.self_attention.fc_v.weight | 65536 |
| decoder.layers.1.self_attention.fc_v.bias | 256 |
| decoder.layers.1.self_attention.fc_o.weight | 65536 |
| decoder.layers.1.self_attention.fc_o.bias | 256 |
| decoder.layers.1.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_q.bias | 256 |
| decoder.layers.1.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_k.bias | 256 |
| decoder.layers.1.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_v.bias | 256 |
| decoder.layers.1.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_o.bias | 256 |
| decoder.layers.1.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.1.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.1.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.1.positionwise_feedforward.fc_2.bias | 256 |
| decoder.layers.2.self_attn_layer_norm.weight | 256 |
| decoder.layers.2.self_attn_layer_norm.bias | 256 |
| decoder.layers.2.enc_attn_layer_norm.weight | 256 |
| decoder.layers.2.enc_attn_layer_norm.bias | 256 |
| decoder.layers.2.ff_layer_norm.weight | 256 |
| decoder.layers.2.ff_layer_norm.bias | 256 |
| decoder.layers.2.self_attention.fc_q.weight | 65536 |
| decoder.layers.2.self_attention.fc_q.bias | 256 |
| decoder.layers.2.self_attention.fc_k.weight | 65536 |
| decoder.layers.2.self_attention.fc_k.bias | 256 |
| decoder.layers.2.self_attention.fc_v.weight | 65536 |
| decoder.layers.2.self_attention.fc_v.bias | 256 |
| decoder.layers.2.self_attention.fc_o.weight | 65536 |
| decoder.layers.2.self_attention.fc_o.bias | 256 |
| decoder.layers.2.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_q.bias | 256 |
| decoder.layers.2.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_k.bias | 256 |
| decoder.layers.2.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_v.bias | 256 |
| decoder.layers.2.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_o.bias | 256 |
| decoder.layers.2.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.2.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.2.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.2.positionwise_feedforward.fc_2.bias | 256 |
| decoder.fc_out.weight | 3209728 |
| decoder.fc_out.bias | 12538 |
+-------------------------------------------------------+------------+
Total Trainable Params: 12490234
本地模型参数-
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
该模型有 12,506,137 个可训练参数
+-------------------------------------------------------+------------+
| Modules | Parameters |
+-------------------------------------------------------+------------+
| encoder.tok_embedding.weight | 2053376 |
| encoder.pos_embedding.weight | 25600 |
| encoder.layers.0.self_attn_layer_norm.weight | 256 |
| encoder.layers.0.self_attn_layer_norm.bias | 256 |
| encoder.layers.0.ff_layer_norm.weight | 256 |
| encoder.layers.0.ff_layer_norm.bias | 256 |
| encoder.layers.0.self_attention.fc_q.weight | 65536 |
| encoder.layers.0.self_attention.fc_q.bias | 256 |
| encoder.layers.0.self_attention.fc_k.weight | 65536 |
| encoder.layers.0.self_attention.fc_k.bias | 256 |
| encoder.layers.0.self_attention.fc_v.weight | 65536 |
| encoder.layers.0.self_attention.fc_v.bias | 256 |
| encoder.layers.0.self_attention.fc_o.weight | 65536 |
| encoder.layers.0.self_attention.fc_o.bias | 256 |
| encoder.layers.0.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.0.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.0.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.0.positionwise_feedforward.fc_2.bias | 256 |
| encoder.layers.1.self_attn_layer_norm.weight | 256 |
| encoder.layers.1.self_attn_layer_norm.bias | 256 |
| encoder.layers.1.ff_layer_norm.weight | 256 |
| encoder.layers.1.ff_layer_norm.bias | 256 |
| encoder.layers.1.self_attention.fc_q.weight | 65536 |
| encoder.layers.1.self_attention.fc_q.bias | 256 |
| encoder.layers.1.self_attention.fc_k.weight | 65536 |
| encoder.layers.1.self_attention.fc_k.bias | 256 |
| encoder.layers.1.self_attention.fc_v.weight | 65536 |
| encoder.layers.1.self_attention.fc_v.bias | 256 |
| encoder.layers.1.self_attention.fc_o.weight | 65536 |
| encoder.layers.1.self_attention.fc_o.bias | 256 |
| encoder.layers.1.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.1.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.1.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.1.positionwise_feedforward.fc_2.bias | 256 |
| encoder.layers.2.self_attn_layer_norm.weight | 256 |
| encoder.layers.2.self_attn_layer_norm.bias | 256 |
| encoder.layers.2.ff_layer_norm.weight | 256 |
| encoder.layers.2.ff_layer_norm.bias | 256 |
| encoder.layers.2.self_attention.fc_q.weight | 65536 |
| encoder.layers.2.self_attention.fc_q.bias | 256 |
| encoder.layers.2.self_attention.fc_k.weight | 65536 |
| encoder.layers.2.self_attention.fc_k.bias | 256 |
| encoder.layers.2.self_attention.fc_v.weight | 65536 |
| encoder.layers.2.self_attention.fc_v.bias | 256 |
| encoder.layers.2.self_attention.fc_o.weight | 65536 |
| encoder.layers.2.self_attention.fc_o.bias | 256 |
| encoder.layers.2.positionwise_feedforward.fc_1.weight | 131072 |
| encoder.layers.2.positionwise_feedforward.fc_1.bias | 512 |
| encoder.layers.2.positionwise_feedforward.fc_2.weight | 131072 |
| encoder.layers.2.positionwise_feedforward.fc_2.bias | 256 |
| decoder.tok_embedding.weight | 3217664 |
| decoder.pos_embedding.weight | 25600 |
| decoder.layers.0.self_attn_layer_norm.weight | 256 |
| decoder.layers.0.self_attn_layer_norm.bias | 256 |
| decoder.layers.0.enc_attn_layer_norm.weight | 256 |
| decoder.layers.0.enc_attn_layer_norm.bias | 256 |
| decoder.layers.0.ff_layer_norm.weight | 256 |
| decoder.layers.0.ff_layer_norm.bias | 256 |
| decoder.layers.0.self_attention.fc_q.weight | 65536 |
| decoder.layers.0.self_attention.fc_q.bias | 256 |
| decoder.layers.0.self_attention.fc_k.weight | 65536 |
| decoder.layers.0.self_attention.fc_k.bias | 256 |
| decoder.layers.0.self_attention.fc_v.weight | 65536 |
| decoder.layers.0.self_attention.fc_v.bias | 256 |
| decoder.layers.0.self_attention.fc_o.weight | 65536 |
| decoder.layers.0.self_attention.fc_o.bias | 256 |
| decoder.layers.0.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_q.bias | 256 |
| decoder.layers.0.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_k.bias | 256 |
| decoder.layers.0.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_v.bias | 256 |
| decoder.layers.0.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.0.encoder_attention.fc_o.bias | 256 |
| decoder.layers.0.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.0.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.0.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.0.positionwise_feedforward.fc_2.bias | 256 |
| decoder.layers.1.self_attn_layer_norm.weight | 256 |
| decoder.layers.1.self_attn_layer_norm.bias | 256 |
| decoder.layers.1.enc_attn_layer_norm.weight | 256 |
| decoder.layers.1.enc_attn_layer_norm.bias | 256 |
| decoder.layers.1.ff_layer_norm.weight | 256 |
| decoder.layers.1.ff_layer_norm.bias | 256 |
| decoder.layers.1.self_attention.fc_q.weight | 65536 |
| decoder.layers.1.self_attention.fc_q.bias | 256 |
| decoder.layers.1.self_attention.fc_k.weight | 65536 |
| decoder.layers.1.self_attention.fc_k.bias | 256 |
| decoder.layers.1.self_attention.fc_v.weight | 65536 |
| decoder.layers.1.self_attention.fc_v.bias | 256 |
| decoder.layers.1.self_attention.fc_o.weight | 65536 |
| decoder.layers.1.self_attention.fc_o.bias | 256 |
| decoder.layers.1.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_q.bias | 256 |
| decoder.layers.1.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_k.bias | 256 |
| decoder.layers.1.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_v.bias | 256 |
| decoder.layers.1.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.1.encoder_attention.fc_o.bias | 256 |
| decoder.layers.1.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.1.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.1.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.1.positionwise_feedforward.fc_2.bias | 256 |
| decoder.layers.2.self_attn_layer_norm.weight | 256 |
| decoder.layers.2.self_attn_layer_norm.bias | 256 |
| decoder.layers.2.enc_attn_layer_norm.weight | 256 |
| decoder.layers.2.enc_attn_layer_norm.bias | 256 |
| decoder.layers.2.ff_layer_norm.weight | 256 |
| decoder.layers.2.ff_layer_norm.bias | 256 |
| decoder.layers.2.self_attention.fc_q.weight | 65536 |
| decoder.layers.2.self_attention.fc_q.bias | 256 |
| decoder.layers.2.self_attention.fc_k.weight | 65536 |
| decoder.layers.2.self_attention.fc_k.bias | 256 |
| decoder.layers.2.self_attention.fc_v.weight | 65536 |
| decoder.layers.2.self_attention.fc_v.bias | 256 |
| decoder.layers.2.self_attention.fc_o.weight | 65536 |
| decoder.layers.2.self_attention.fc_o.bias | 256 |
| decoder.layers.2.encoder_attention.fc_q.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_q.bias | 256 |
| decoder.layers.2.encoder_attention.fc_k.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_k.bias | 256 |
| decoder.layers.2.encoder_attention.fc_v.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_v.bias | 256 |
| decoder.layers.2.encoder_attention.fc_o.weight | 65536 |
| decoder.layers.2.encoder_attention.fc_o.bias | 256 |
| decoder.layers.2.positionwise_feedforward.fc_1.weight | 131072 |
| decoder.layers.2.positionwise_feedforward.fc_1.bias | 512 |
| decoder.layers.2.positionwise_feedforward.fc_2.weight | 131072 |
| decoder.layers.2.positionwise_feedforward.fc_2.bias | 256 |
| decoder.fc_out.weight | 3217664 |
| decoder.fc_out.bias | 12569 |
+-------------------------------------------------------+------------+
Total Trainable Params: 12506137
所以,这就是我无法加载模型的原因。因为模型在本地有不同的参数。
即使我尝试在本地加载权重,它也会给我-
model.load_state_dict(torch.load(f"{model_name}.pt", map_location=device))
错误-
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-24-f5baac4441a5> in <module>
----> 1 model.load_state_dict(torch.load(f"{model_name}_2.pt", map_location=device))
c:\anaconda\envs\lang_trans\lib\site-packages\torch\nn\modules\module.py in load_state_dict(self, state_dict, strict)
845 if len(error_msgs) > 0:
846 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 847 self.__class__.__name__, "\n\t".join(error_msgs)))
848 return _IncompatibleKeys(missing_keys, unexpected_keys)
849
RuntimeError: Error(s) in loading state_dict for Seq2Seq: size mismatch for decoder.tok_embedding.weight: copying a param with shape torch.Size([12538, 256]) from checkpoint, the shape in current model is torch.Size([12569, 256]). size mismatch for decoder.fc_out.weight: copying a param with shape torch.Size([12538, 256]) from checkpoint, the shape in current model is torch.Size([12569, 256]). size mismatch for decoder.fc_out.bias: copying a param with shape torch.Size([12538]) from checkpoint, the shape in current model is torch.Size([12569]).
本地的模型参数一定是错误的,因为在 colab (device=CPU, runtime=None) 中,我可以在定义模型类后加载权重。但是在本地机器中,参数发生了变化,所以我无法加载权重。我知道这很奇怪,请帮助我找到解决方案。
您可以在此处查看模型的完整代码-
<script src="https://gist.github.com/Dipeshpal/90c715a7b7f00845e20ef998bda35835.js"></script>
https://gist.github.com/Dipeshpal/90c715a7b7f00845e20ef998bda35835
在此模型参数更改后。