tokenize - AttributeError：“GPT2TokenizerFast”对象没有属性“max_len”

Question

我只是在使用 huggingface 转换器库并在运行 run_lm_finetuning.py 时收到以下消息： AttributeError: 'GPT2TokenizerFast' object has no attribute 'max_len'。其他人有这个问题或想法如何解决它？谢谢！

我的完整实验运行：mkdir Experiments

对于 5 中的 epoch，python run_lm_finetuning.py
--model_name_or_path distilgpt2
--model_type gpt2 --train_data_file small_dataset_train_preprocessed.txt --output_dir
Experiments
/epochs_$epoch
--do_train
--overwrite_output_dir
--per_device_train_batch_size 4
--num_train_epochs $epoch done

score 6 · Accepted Answer

“ AttributeError: 'BertTokenizerFast' 对象没有属性 'max_len'” Github 问题包含修复：

该run_language_modeling.py脚本已弃用，取而代之的是language-modeling/run_{clm, plm, mlm}.py.

如果没有，解决方法是更改max_len为model_max_length.

tokenize - AttributeError：“GPT2TokenizerFast”对象没有属性“max_len”

1 回答 1

Related

Reference