python - 使用 beam_search（huggingface 库）生成文本时出现不匹配的张量大小错误

Question

我正在使用 huggingface 库使用预训练的 distilgpt2 模型生成文本。特别是，我正在使用beam_search函数，因为我想包含一个 LogitsProcessorList （不能与generate函数一起使用）。

我的代码的相关部分如下所示：

beam_scorer = BeamSearchScorer(
            batch_size=btchsze,
            max_length=15,  # not sure why lengths under 20 fail
            num_beams=num_seq,
            device=model.device,
        )
j = input_ids.tile((num_seq*btchsze,1))
next_output = model.beam_search(
            j, 
            beam_scorer,
            eos_token_id=tokenizer.encode('.')[0],
            logits_processor=logits_processor
        )

但是，当我尝试使用小于 20 的 max_length 生成时，beam_search 函数会引发此错误：

~/anaconda3/envs/techtweets37/lib/python3.7/site-packages/transformers-4.4.2-py3.8.egg/transformers/generation_beam_search.py in finalize(self, input_ids, final_beam_scores, final_beam_tokens, final_beam_indices, pad_token_id, eos_token_id)
    326         # fill with hypotheses and eos_token_id if the latter fits in
    327         for i, hypo in enumerate(best):
--> 328             decoded[i, : sent_lengths[i]] = hypo
    329             if sent_lengths[i] < self.max_length:
    330                 decoded[i, sent_lengths[i]] = eos_token_id

RuntimeError: The expanded size of the tensor (15) must match the existing size (20) at non-singleton dimension 0.  Target sizes: [15].  Tensor sizes: [20]

我似乎无法弄清楚 20 来自哪里：即使输入长度更长或更短，即使我使用不同的批量大小或光束数量，它也是一样的。我没有将任何内容定义为长度 20，也找不到任何默认值。序列的最大长度确实会影响光束搜索的结果，所以我想弄清楚这一点并能够设置更短的最大长度。

score 1 · Accepted Answer

这是拥抱人脸库中的一个已知问题：

https://github.com/huggingface/transformers/issues/11040

基本上，beam scorer 不是使用max_length传递给它的，而是max_length使用模型的。

目前，修复方法是设置model.config.max_length为所需的最大长度。

python - 使用 beam_search（huggingface 库）生成文本时出现不匹配的张量大小错误

1 回答 1

Related

Reference