我正在使用 aitextgen 使用 train 函数微调 355M GPT-2 模型。数据集是由以下行组成的小型 txt 文件(这些是基于关键字的文本生成的编码文本,因此是“~^keywords~@”):
<|startoftext|>~^~@"Yes, but one forgets that she is there--or anywhere. She seems as if she were an accident."<|endoftext|>
<|startoftext|>~^man~@"Then jump out and unharness this horse. A man will come for it to- morrow."<|endoftext|>
<|startoftext|>~^mind 's~@"It would upset the house terribly," said Nan; "but I don't mind that. I'm with you, Patty. Let's do it."<|endoftext|>
<|startoftext|>~^Booth sure say wish~@"I wish I were sure that I had," said Booth.<|endoftext|>
我像这样使用aitextgen的训练功能:
gpt2 = aitextgen(tf_gpt2 = "355M", to_gpu= True)
gpt2.train(dataset,
line_by_line = True,
batch_size= 1,
num_steps = 50,
save_every = 10,
generate_every = 10,
learning_rate = 1e-3,
fp16 = False)
当我运行这个函数时,我得到这个输出:
0%| | 0/10000 [00:00<?, ?it/s]
Windows does not support multi-GPU training. Setting to 1 GPU.
C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\pytorch_lightning\trainer\connectors\callback_connector.py:147: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=False)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=False)`.
rank_zero_deprecation(
C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\pytorch_lightning\trainer\connectors\callback_connector.py:90: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=20)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\pytorch_lightning\trainer\connectors\callback_connector.py:167: LightningDeprecationWarning: Setting `Trainer(weights_summary=None)` is deprecated in v1.5 and will be removed in v1.7. Please set `Trainer(enable_model_summary=False)` instead.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\transformers\modeling_utils.py", line 1364, in from_pretrained
state_dict = torch.load(resolved_archive_file, map_location="cpu")
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\torch\serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\torch\serialization.py", line 882, in _load
result = unpickler.load()
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\torch\serialization.py", line 857, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\torch\serialization.py", line 845, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 205852672 bytes.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Josh\Python Projects\FYP\src\[py file name].py", line 34, in <module>
gpt2 = aitextgen(tf_gpt2 = "355M", to_gpu= True)
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\aitextgen\aitextgen.py", line 166, in __init__
self.model = GPT2LMHeadModel.from_pretrained(model, config=config)
File "C:\Users\Josh\anaconda3\envs\gpt2_env\lib\site-packages\transformers\modeling_utils.py", line 1368, in from_pretrained
if f.read().startswith("version"):
MemoryError
我尝试了很多方法,包括使用 清除 CUDA 缓存torch.cuda.empty_cache()
,将文件拆分为更小的文件。他们都没有工作。
我在我的本地机器(RTX3070,32GB RAM)上运行它,我检查了任务管理器,RAM 使用率几乎没有达到 50%。我的代码有什么问题导致内存错误吗?