Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我正在阅读 GPT2 语言模型的代码。隐藏状态到词汇表概率分布的转换在以下行中完成:
lm_logits = self.lm_head(hidden_states)
这里,
self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
然而,在原始论文中,他们建议将隐藏状态与令牌嵌入矩阵相乘,而拥抱脸的 实现则使用另一个矩阵。
这有什么好处吗?我错过了什么吗?
两层共享相同的权重。 https://github.com/huggingface/transformers/issues/2824