machine-learning - Pytorch 模型 CPU 和 GPU 的参数变化

Question

我已经创建了模型并使用 google colab 保存了权重。现在我创建了一个预测脚本。预测脚本包含模型类。我正在尝试使用以下方法加载模型权重-

节省 GPU，加载 CPU 节省：

torch.save(model.state_dict(), PATH)

加载：

device = torch.device('cpu')
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=device))

上面的方法应该可行吧？是的。

但是当我尝试这样做时，我在 Google Colab 中有不同的模型参数（预测，运行时-无，设备 = CPU），而在我的本地机器中则不同（预测，设备 = cpu）

Colab 中的模型参数-

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

该模型有 12,490,234 个可训练参数

+-------------------------------------------------------+------------+
|                        Modules                        | Parameters |
+-------------------------------------------------------+------------+
|              encoder.tok_embedding.weight             |  2053376   |
|              encoder.pos_embedding.weight             |   25600    |
|      encoder.layers.0.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.0.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.0.ff_layer_norm.weight         |    256     |
|          encoder.layers.0.ff_layer_norm.bias          |    256     |
|      encoder.layers.0.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_q.bias       |    256     |
|      encoder.layers.0.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_k.bias       |    256     |
|      encoder.layers.0.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_v.bias       |    256     |
|      encoder.layers.0.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_o.bias       |    256     |
| encoder.layers.0.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.0.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.0.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.0.positionwise_feedforward.fc_2.bias  |    256     |
|      encoder.layers.1.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.1.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.1.ff_layer_norm.weight         |    256     |
|          encoder.layers.1.ff_layer_norm.bias          |    256     |
|      encoder.layers.1.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_q.bias       |    256     |
|      encoder.layers.1.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_k.bias       |    256     |
|      encoder.layers.1.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_v.bias       |    256     |
|      encoder.layers.1.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_o.bias       |    256     |
| encoder.layers.1.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.1.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.1.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.1.positionwise_feedforward.fc_2.bias  |    256     |
|      encoder.layers.2.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.2.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.2.ff_layer_norm.weight         |    256     |
|          encoder.layers.2.ff_layer_norm.bias          |    256     |
|      encoder.layers.2.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_q.bias       |    256     |
|      encoder.layers.2.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_k.bias       |    256     |
|      encoder.layers.2.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_v.bias       |    256     |
|      encoder.layers.2.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_o.bias       |    256     |
| encoder.layers.2.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.2.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.2.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.2.positionwise_feedforward.fc_2.bias  |    256     |
|              decoder.tok_embedding.weight             |  3209728   |
|              decoder.pos_embedding.weight             |   25600    |
|      decoder.layers.0.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.0.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.0.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.0.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.0.ff_layer_norm.weight         |    256     |
|          decoder.layers.0.ff_layer_norm.bias          |    256     |
|      decoder.layers.0.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_q.bias       |    256     |
|      decoder.layers.0.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_k.bias       |    256     |
|      decoder.layers.0.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_v.bias       |    256     |
|      decoder.layers.0.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_o.bias       |    256     |
|     decoder.layers.0.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.0.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.0.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.0.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.0.positionwise_feedforward.fc_2.bias  |    256     |
|      decoder.layers.1.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.1.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.1.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.1.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.1.ff_layer_norm.weight         |    256     |
|          decoder.layers.1.ff_layer_norm.bias          |    256     |
|      decoder.layers.1.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_q.bias       |    256     |
|      decoder.layers.1.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_k.bias       |    256     |
|      decoder.layers.1.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_v.bias       |    256     |
|      decoder.layers.1.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_o.bias       |    256     |
|     decoder.layers.1.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.1.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.1.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.1.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.1.positionwise_feedforward.fc_2.bias  |    256     |
|      decoder.layers.2.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.2.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.2.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.2.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.2.ff_layer_norm.weight         |    256     |
|          decoder.layers.2.ff_layer_norm.bias          |    256     |
|      decoder.layers.2.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_q.bias       |    256     |
|      decoder.layers.2.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_k.bias       |    256     |
|      decoder.layers.2.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_v.bias       |    256     |
|      decoder.layers.2.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_o.bias       |    256     |
|     decoder.layers.2.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.2.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.2.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.2.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.2.positionwise_feedforward.fc_2.bias  |    256     |
|                 decoder.fc_out.weight                 |  3209728   |
|                  decoder.fc_out.bias                  |   12538    |
+-------------------------------------------------------+------------+
Total Trainable Params: 12490234

本地模型参数-

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

该模型有 12,506,137 个可训练参数

+-------------------------------------------------------+------------+
|                        Modules                        | Parameters |
+-------------------------------------------------------+------------+
|              encoder.tok_embedding.weight             |  2053376   |
|              encoder.pos_embedding.weight             |   25600    |
|      encoder.layers.0.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.0.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.0.ff_layer_norm.weight         |    256     |
|          encoder.layers.0.ff_layer_norm.bias          |    256     |
|      encoder.layers.0.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_q.bias       |    256     |
|      encoder.layers.0.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_k.bias       |    256     |
|      encoder.layers.0.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_v.bias       |    256     |
|      encoder.layers.0.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.0.self_attention.fc_o.bias       |    256     |
| encoder.layers.0.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.0.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.0.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.0.positionwise_feedforward.fc_2.bias  |    256     |
|      encoder.layers.1.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.1.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.1.ff_layer_norm.weight         |    256     |
|          encoder.layers.1.ff_layer_norm.bias          |    256     |
|      encoder.layers.1.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_q.bias       |    256     |
|      encoder.layers.1.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_k.bias       |    256     |
|      encoder.layers.1.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_v.bias       |    256     |
|      encoder.layers.1.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.1.self_attention.fc_o.bias       |    256     |
| encoder.layers.1.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.1.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.1.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.1.positionwise_feedforward.fc_2.bias  |    256     |
|      encoder.layers.2.self_attn_layer_norm.weight     |    256     |
|       encoder.layers.2.self_attn_layer_norm.bias      |    256     |
|         encoder.layers.2.ff_layer_norm.weight         |    256     |
|          encoder.layers.2.ff_layer_norm.bias          |    256     |
|      encoder.layers.2.self_attention.fc_q.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_q.bias       |    256     |
|      encoder.layers.2.self_attention.fc_k.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_k.bias       |    256     |
|      encoder.layers.2.self_attention.fc_v.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_v.bias       |    256     |
|      encoder.layers.2.self_attention.fc_o.weight      |   65536    |
|       encoder.layers.2.self_attention.fc_o.bias       |    256     |
| encoder.layers.2.positionwise_feedforward.fc_1.weight |   131072   |
|  encoder.layers.2.positionwise_feedforward.fc_1.bias  |    512     |
| encoder.layers.2.positionwise_feedforward.fc_2.weight |   131072   |
|  encoder.layers.2.positionwise_feedforward.fc_2.bias  |    256     |
|              decoder.tok_embedding.weight             |  3217664   |
|              decoder.pos_embedding.weight             |   25600    |
|      decoder.layers.0.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.0.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.0.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.0.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.0.ff_layer_norm.weight         |    256     |
|          decoder.layers.0.ff_layer_norm.bias          |    256     |
|      decoder.layers.0.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_q.bias       |    256     |
|      decoder.layers.0.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_k.bias       |    256     |
|      decoder.layers.0.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_v.bias       |    256     |
|      decoder.layers.0.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.0.self_attention.fc_o.bias       |    256     |
|     decoder.layers.0.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.0.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.0.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.0.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.0.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.0.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.0.positionwise_feedforward.fc_2.bias  |    256     |
|      decoder.layers.1.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.1.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.1.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.1.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.1.ff_layer_norm.weight         |    256     |
|          decoder.layers.1.ff_layer_norm.bias          |    256     |
|      decoder.layers.1.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_q.bias       |    256     |
|      decoder.layers.1.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_k.bias       |    256     |
|      decoder.layers.1.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_v.bias       |    256     |
|      decoder.layers.1.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.1.self_attention.fc_o.bias       |    256     |
|     decoder.layers.1.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.1.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.1.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.1.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.1.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.1.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.1.positionwise_feedforward.fc_2.bias  |    256     |
|      decoder.layers.2.self_attn_layer_norm.weight     |    256     |
|       decoder.layers.2.self_attn_layer_norm.bias      |    256     |
|      decoder.layers.2.enc_attn_layer_norm.weight      |    256     |
|       decoder.layers.2.enc_attn_layer_norm.bias       |    256     |
|         decoder.layers.2.ff_layer_norm.weight         |    256     |
|          decoder.layers.2.ff_layer_norm.bias          |    256     |
|      decoder.layers.2.self_attention.fc_q.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_q.bias       |    256     |
|      decoder.layers.2.self_attention.fc_k.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_k.bias       |    256     |
|      decoder.layers.2.self_attention.fc_v.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_v.bias       |    256     |
|      decoder.layers.2.self_attention.fc_o.weight      |   65536    |
|       decoder.layers.2.self_attention.fc_o.bias       |    256     |
|     decoder.layers.2.encoder_attention.fc_q.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_q.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_k.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_k.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_v.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_v.bias     |    256     |
|     decoder.layers.2.encoder_attention.fc_o.weight    |   65536    |
|      decoder.layers.2.encoder_attention.fc_o.bias     |    256     |
| decoder.layers.2.positionwise_feedforward.fc_1.weight |   131072   |
|  decoder.layers.2.positionwise_feedforward.fc_1.bias  |    512     |
| decoder.layers.2.positionwise_feedforward.fc_2.weight |   131072   |
|  decoder.layers.2.positionwise_feedforward.fc_2.bias  |    256     |
|                 decoder.fc_out.weight                 |  3217664   |
|                  decoder.fc_out.bias                  |   12569    |
+-------------------------------------------------------+------------+
Total Trainable Params: 12506137

所以，这就是我无法加载模型的原因。因为模型在本地有不同的参数。

即使我尝试在本地加载权重，它也会给我-

model.load_state_dict(torch.load(f"{model_name}.pt", map_location=device))

错误-

--------------------------------------------------------------------------- RuntimeError                              Traceback (most recent call last) <ipython-input-24-f5baac4441a5> in <module>
----> 1 model.load_state_dict(torch.load(f"{model_name}_2.pt", map_location=device))

c:\anaconda\envs\lang_trans\lib\site-packages\torch\nn\modules\module.py in load_state_dict(self, state_dict, strict)
    845         if len(error_msgs) > 0:
    846             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 847                                self.__class__.__name__, "\n\t".join(error_msgs)))
    848         return _IncompatibleKeys(missing_keys, unexpected_keys)
    849 

RuntimeError: Error(s) in loading state_dict for Seq2Seq:   size mismatch for decoder.tok_embedding.weight: copying a param with shape torch.Size([12538, 256]) from checkpoint, the shape in current model is torch.Size([12569, 256]).    size mismatch for decoder.fc_out.weight: copying a param with shape torch.Size([12538, 256]) from checkpoint, the shape in current model is torch.Size([12569, 256]).   size mismatch for decoder.fc_out.bias: copying a param with shape torch.Size([12538]) from checkpoint, the shape in current model is torch.Size([12569]).

本地的模型参数一定是错误的，因为在 colab (device=CPU, runtime=None) 中，我可以在定义模型类后加载权重。但是在本地机器中，参数发生了变化，所以我无法加载权重。我知道这很奇怪，请帮助我找到解决方案。

您可以在此处查看模型的完整代码-

<script src="https://gist.github.com/Dipeshpal/90c715a7b7f00845e20ef998bda35835.js"></script>

https://gist.github.com/Dipeshpal/90c715a7b7f00845e20ef998bda35835

在此模型参数更改后。

machine-learning - Pytorch 模型 CPU 和 GPU 的参数变化

上面的方法应该可行吧？是的。

0 回答 0

Related

Reference