0
>>> from transformers import GPT2Tokenizer, GPT2Model
>>> model = GPT2Model.from_pretrained("gpt2",output_attentions=True)
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> text = "a,b,c"
>>> inputs = tokenizer.encode_plus(text,return_tensors='pt',add_special_tokens=True)
>>> input_ids = inputs['input_ids']
>>> attention = model(input_ids)[-1]
>>> attention[0].shape
torch.Size([1, 12, 5, 5])
>>> import transformers
>>> m2 = transformers.AutoModelWithLMHead.from_pretrained("gpt2")
>>> at2 = m2(input_ids)[-1]
>>> at2[0].shape
torch.Size([2, 1, 12, 5, 64])

供您参考,attention是一个元组,attention[0]用于它的第一层。

我可以映射除2in torch.Size([2, 1, 12, 5, 64])vs之外的所有内容torch.Size([1, 12, 5, 5])。那是什么2意思?

我从bertvizgithub repo 得到这些定义:

            attention: list of ``torch.FloatTensor``(one for each layer) of shape
                ``(batch_size(must be 1), num_heads, sequence_length, sequence_length)``
4

1 回答 1

0
>>> m3 = m2.from_pretrained("gpt2",output_attentions=True)
>>> m3(inputs)[-1][0].shape
torch.Size([1, 12, 5, 5])

奇怪的是transformers.AutoModelWithLMHead.from_pretrained不允许output_attentions成为它的一员args

在运行时编辑配置也不起作用(m2.config.output_attentions = True不起作用。)

于 2020-03-10T20:28:52.847 回答