>>> from transformers import GPT2Tokenizer, GPT2Model
>>> model = GPT2Model.from_pretrained("gpt2",output_attentions=True)
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> text = "a,b,c"
>>> inputs = tokenizer.encode_plus(text,return_tensors='pt',add_special_tokens=True)
>>> input_ids = inputs['input_ids']
>>> attention = model(input_ids)[-1]
>>> attention[0].shape
torch.Size([1, 12, 5, 5])
>>> import transformers
>>> m2 = transformers.AutoModelWithLMHead.from_pretrained("gpt2")
>>> at2 = m2(input_ids)[-1]
>>> at2[0].shape
torch.Size([2, 1, 12, 5, 64])
供您参考,attention
是一个元组,attention[0]
用于它的第一层。
我可以映射除2
in torch.Size([2, 1, 12, 5, 64])
vs之外的所有内容torch.Size([1, 12, 5, 5])
。那是什么2
意思?
我从bertviz
github repo 得到这些定义:
attention: list of ``torch.FloatTensor``(one for each layer) of shape
``(batch_size(must be 1), num_heads, sequence_length, sequence_length)``