我通过 TFRobertaModel.frompretrained('Roberta-base') 加载 Roberta 模型并使用 Keras 对其进行训练。我在罗伯塔之上还有其他层,我需要用所有参数初始化裸罗伯塔。我在 Colab 上运行我的代码,自从加载 Roberta 几周后,我曾经收到以下警告,但仍然一切正常,模型训练正常,尽管“lm_head”权重没有初始化:
Some weights of the model checkpoint at Roberta-base were not used when initializing ROBERTA: [‘lm_head’]
但是现在,我认为 colab 上的转换器版本已经更改,因为我收到了使用相同代码的新警告,表明更多的编码器和偏置层没有初始化,这导致损失函数没有减少:
Some layers from the model checkpoint at roberta-base were not used when initializing ROBERTA: ['lm_head', 'encoder/layer_._3/attention/self/value/bias:0', 'encoder/layer_._10/attention/self/value/bias:0', 'encoder/layer_._10/attention/self/key/kernel:0', 'pooler/dense/bias:0', 'encoder/layer_._9/attention/self/query/kernel:0', 'encoder/layer_._10/attention/self/query/kernel:0', 'encoder/layer_._7/attention/output/dense/bias:0', 'embeddings/position_embeddings/embeddings:0', 'encoder/layer_._6/intermediate/dense/kernel:0', 'encoder/layer_._11/intermediate/dense/kernel:0', 'encoder/layer_._8/intermediate/dense/bias:0', 'encoder/layer_._10/attention/self/value/kernel:0', 'encoder/layer_._7/output/dense/bias:0', 'encoder/layer_._6/attention/self/value/bias:0', 'encoder/layer_._8/attention/output/dense/kernel:0', 'encoder/layer_._10/intermediate/dense/kernel:0', 'encoder/layer_._5/attention/self/value/kernel:0', 'encoder/layer_._6/attention/output/LayerNorm/gamma:0', 'encoder/layer_._7/attention/self/query/kernel:0', 'encoder/layer_._6/attention/self/query/kernel:0', 'encoder/layer_._6/attention/self/key/bias:0', 'encoder/layer_._8/attention/output/LayerNorm/gamma:0', 'encoder/layer_._2/output/dense/kernel:0', 'encoder/layer_._11/intermediate/dense/bias:0', 'encoder/layer_._6/output/dense/kernel:0', 'encoder/layer_._2/intermediate/dense/kernel:0', 'encoder/layer_._3/intermediate/dense/kernel:0', 'encoder/layer_._10/output/LayerNorm/beta:0', 'encoder/layer_._6/attention/self/query/bias:0', 'encoder/layer_._6/attention/output/LayerNorm/beta:0', 'encoder/layer_._9/attention/self/value/bias:0', 'encoder/layer_._8/attention/self/query/kernel:0', 'encoder/layer_._0/output/LayerNorm/gamma:0', 'encoder/layer_._11/attention/output/dense/bias:0', 'encoder/layer_._7/attention/self/value/bias:0', 'encoder/layer_._0/attention/output/dense/kernel:0', 'encoder/layer_._9/intermediate/dense/bias:0', 'encoder/layer_._2/attention/self/query/kernel:0', 'encoder/layer_._0/attention/self/key/bias:0', 'encoder/layer_._8/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/attention/self/value/kernel:0', 'encoder/layer_._6/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/dense/bias:0', 'encoder/layer_._3/attention/self/query/bias:0', 'encoder/layer_._3/output/dense/bias:0', 'encoder/layer_._1/attention/self/key/kernel:0', 'encoder/layer_._8/attention/self/key/kernel:0', 'encoder/layer_._9/intermediate/dense/kernel:0', 'encoder/layer_._3/output/dense/kernel:0', 'encoder/layer_._2/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/self/key/bias:0', 'encoder/layer_._5/attention/self/key/kernel:0', 'encoder/layer_._5/attention/self/query/bias:0', 'encoder/layer_._2/attention/output/dense/bias:0', 'encoder/layer_._4/intermediate/dense/kernel:0', 'encoder/layer_._1/intermediate/dense/bias:0', 'encoder/layer_._4/attention/self/value/kernel:0', 'encoder/layer_._11/attention/self/key/bias:0', 'encoder/layer_._5/output/dense/kernel:0', 'encoder/layer_._1/output/dense/bias:0', 'encoder/layer_._0/attention/self/value/bias:0', 'encoder/layer_._6/attention/self/key/kernel:0', 'encoder/layer_._9/attention/self/key/bias:0', 'encoder/layer_._7/output/LayerNorm/gamma:0', 'encoder/layer_._8/attention/output/dense/bias:0', 'encoder/layer_._10/attention/output/dense/bias:0', 'encoder/layer_._0/intermediate/dense/kernel:0', 'encoder/layer_._5/intermediate/dense/kernel:0', 'encoder/layer_._11/attention/self/value/kernel:0', 'encoder/layer_._8/attention/self/key/bias:0', 'encoder/layer_._8/output/dense/bias:0', 'encoder/layer_._8/intermediate/dense/kernel:0', 'encoder/layer_._7/attention/output/LayerNorm/beta:0', 'encoder/layer_._2/output/dense/bias:0', 'encoder/layer_._3/attention/output/dense/bias:0', 'encoder/layer_._0/output/dense/bias:0', 'encoder/layer_._9/attention/self/key/kernel:0', 'encoder/layer_._11/output/dense/bias:0', 'encoder/layer_._7/attention/self/query/bias:0', 'encoder/layer_._10/attention/self/key/bias:0', 'encoder/layer_._2/attention/output/dense/kernel:0', 'encoder/layer_._2/attention/self/query/bias:0', 'encoder/layer_._9/attention/output/dense/kernel:0', 'encoder/layer_._9/attention/output/LayerNorm/gamma:0', 'encoder/layer_._9/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/intermediate/dense/kernel:0', 'encoder/layer_._1/output/dense/kernel:0', 'encoder/layer_._1/attention/self/key/bias:0', 'encoder/layer_._2/attention/self/value/kernel:0', 'encoder/layer_._9/attention/self/value/kernel:0', 'encoder/layer_._10/intermediate/dense/bias:0', 'encoder/layer_._4/intermediate/dense/bias:0', 'encoder/layer_._6/output/LayerNorm/beta:0', 'encoder/layer_._7/output/LayerNorm/beta:0', 'encoder/layer_._11/attention/self/query/bias:0', 'encoder/layer_._0/intermediate/dense/bias:0', 'encoder/layer_._11/attention/output/dense/kernel:0', 'encoder/layer_._5/attention/self/query/kernel:0', 'encoder/layer_._8/attention/self/value/kernel:0', 'encoder/layer_._11/output/LayerNorm/beta:0', 'encoder/layer_._9/output/dense/bias:0', 'encoder/layer_._4/output/dense/bias:0', 'encoder/layer_._2/attention/self/key/bias:0', 'encoder/layer_._3/attention/self/query/kernel:0', 'encoder/layer_._4/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/output/LayerNorm/beta:0', 'encoder/layer_._10/attention/output/LayerNorm/beta:0', 'encoder/layer_._3/attention/self/value/kernel:0', 'encoder/layer_._10/attention/self/query/bias:0', 'encoder/layer_._3/attention/self/key/bias:0', 'pooler/dense/kernel:0', 'encoder/layer_._1/attention/self/value/bias:0', 'encoder/layer_._7/attention/self/key/kernel:0', 'encoder/layer_._1/attention/output/dense/kernel:0', 'encoder/layer_._4/attention/self/key/kernel:0', 'encoder/layer_._8/output/dense/kernel:0', 'encoder/layer_._3/attention/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/self/value/kernel:0', 'encoder/layer_._3/attention/self/key/kernel:0', 'encoder/layer_._0/attention/self/query/kernel:0', 'encoder/layer_._3/intermediate/dense/bias:0', 'encoder/layer_._7/output/dense/kernel:0', 'encoder/layer_._10/output/dense/kernel:0', 'encoder/layer_._7/intermediate/dense/bias:0', 'embeddings/word_embeddings/weight:0', 'encoder/layer_._3/attention/output/LayerNorm/beta:0', 'encoder/layer_._0/attention/self/key/kernel:0', 'encoder/layer_._4/output/dense/kernel:0', 'encoder/layer_._5/output/LayerNorm/gamma:0', 'encoder/layer_._9/attention/output/dense/bias:0', 'encoder/layer_._0/attention/output/dense/bias:0', 'encoder/layer_._5/attention/output/LayerNorm/gamma:0', 'encoder/layer_._9/attention/output/LayerNorm/beta:0', 'encoder/layer_._11/output/LayerNorm/gamma:0', 'encoder/layer_._11/attention/output/LayerNorm/gamma:0', 'encoder/layer_._6/intermediate/dense/bias:0', 'encoder/layer_._2/attention/output/LayerNorm/gamma:0', 'encoder/layer_._5/output/dense/bias:0', 'encoder/layer_._0/output/dense/kernel:0', 'encoder/layer_._6/attention/output/dense/kernel:0', 'encoder/layer_._6/attention/output/dense/bias:0', 'encoder/layer_._1/attention/self/query/kernel:0', 'encoder/layer_._0/attention/self/query/bias:0', 'encoder/layer_._11/attention/self/value/bias:0', 'encoder/layer_._2/intermediate/dense/bias:0', 'embeddings/LayerNorm/beta:0', 'encoder/layer_._4/attention/output/dense/kernel:0', 'encoder/layer_._3/output/LayerNorm/beta:0', 'encoder/layer_._8/output/LayerNorm/gamma:0', 'encoder/layer_._10/attention/output/dense/kernel:0', 'encoder/layer_._11/output/dense/kernel:0', 'encoder/layer_._2/attention/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/output/dense/kernel:0', 'encoder/layer_._9/attention/self/query/bias:0', 'encoder/layer_._4/attention/self/key/bias:0', 'encoder/layer_._2/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/self/query/bias:0', 'encoder/layer_._5/attention/output/LayerNorm/beta:0', 'encoder/layer_._10/output/dense/bias:0', 'encoder/layer_._8/output/LayerNorm/beta:0', 'encoder/layer_._5/output/LayerNorm/beta:0', 'embeddings/token_type_embeddings/embeddings:0', 'encoder/layer_._5/attention/output/dense/bias:0', 'encoder/layer_._4/output/LayerNorm/beta:0', 'encoder/layer_._4/attention/self/query/kernel:0', 'encoder/layer_._5/attention/output/dense/kernel:0', 'encoder/layer_._7/attention/self/value/kernel:0', 'encoder/layer_._7/intermediate/dense/kernel:0', 'encoder/layer_._11/attention/self/key/kernel:0', 'encoder/layer_._3/output/LayerNorm/gamma:0', 'encoder/layer_._10/output/LayerNorm/gamma:0', 'encoder/layer_._8/attention/self/query/bias:0', 'encoder/layer_._3/attention/output/dense/kernel:0', 'encoder/layer_._4/output/LayerNorm/gamma:0', 'encoder/layer_._10/attention/output/LayerNorm/gamma:0', 'encoder/layer_._4/attention/self/value/bias:0', 'encoder/layer_._11/attention/self/query/kernel:0', 'encoder/layer_._4/attention/output/dense/bias:0', 'encoder/layer_._4/attention/output/LayerNorm/beta:0', 'encoder/layer_._5/attention/self/key/bias:0', 'encoder/layer_._6/attention/self/value/kernel:0', 'encoder/layer_._5/attention/self/value/bias:0', 'encoder/layer_._11/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/output/LayerNorm/gamma:0', 'encoder/layer_._2/attention/self/value/bias:0', 'encoder/layer_._9/output/dense/kernel:0', 'encoder/layer_._2/attention/self/key/kernel:0', 'encoder/layer_._9/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/output/LayerNorm/gamma:0', 'encoder/layer_._5/intermediate/dense/bias:0', 'embeddings/LayerNorm/gamma:0', 'encoder/layer_._0/output/LayerNorm/beta:0', 'encoder/layer_._6/output/dense/bias:0', 'encoder/layer_._8/attention/self/value/bias:0', 'encoder/layer_._4/attention/self/query/bias:0']
谁能帮我解决我的问题:我如何加载 Roberta 并正确初始化它的所有权重?