所以我目前正在使用在 Tensorflow 上运行的 GPT2 进行文本生成。我正在专门处理这个 repo。我最近决定安装 CUDA 和 cudnn 以提高 GPU 功能并通过这些说明安装它。我目前正在为我的 GPU 使用带有 NVIDIA Geforce GTX 1650 的 Windows 10 x64,并且我正在使用命令提示符终端。我尽我所能按照说明进行操作:下载正确的 GPU 驱动程序、设置环境变量、将 cudnn 文件复制到它们应该去的位置等。完成安装后,我尝试使用我训练的模型生成无条件样本,这发生了:
Microsoft Windows [Version 10.0.19043.1288]
(c) Microsoft Corporation. All rights reserved.
C:\Users\"username">cd C:\Users\"username"\Desktop\gpt-2-finetuning\src
C:\Users\"username"\Desktop\gpt-2-finetuning\src> python generate_unconditional_samples.py --model_name novel
2021-10-17 00:18:21.694165: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-17 00:18:22.435510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Users\"username"\Desktop\gpt-2-finetuning\src\sample.py:60: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\util\dispatch.py:206: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
2021-10-17 00:18:45.451534: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 196.32MiB (rounded to 205852672)requested by op sample_sequence/while/body/_1/model/MatMul/ReadVariableOp
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-10-17 00:18:45.467103: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-10-17 00:18:45.474451: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256): Total Chunks: 15, Chunks in use: 15. 3.8KiB allocated for chunks. 3.8KiB in use in bin. 60B client-requested in use in bin.
2021-10-17 00:18:45.481771: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512): Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.489403: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-10-17 00:18:45.498581: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.509522: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096): Total Chunks: 148, Chunks in use: 148. 592.0KiB allocated for chunks. 592.0KiB in use in bin. 592.0KiB client-requested in use in bin.
2021-10-17 00:18:45.517609: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192): Total Chunks: 25, Chunks in use: 25. 300.0KiB allocated for chunks. 300.0KiB in use in bin. 300.0KiB client-requested in use in bin.
2021-10-17 00:18:45.526116: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384): Total Chunks: 24, Chunks in use: 24. 384.0KiB allocated for chunks. 384.0KiB in use in bin. 384.0KiB client-requested in use in bin.
2021-10-17 00:18:45.536214: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.548694: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.563635: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072): Total Chunks: 4, Chunks in use: 4. 786.0KiB allocated for chunks. 786.0KiB in use in bin. 785.3KiB client-requested in use in bin.
2021-10-17 00:18:45.578935: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.594547: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.601621: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.608788: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.619285: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304): Total Chunks: 25, Chunks in use: 25. 100.00MiB allocated for chunks. 100.00MiB in use in bin. 100.00MiB client-requested in use in bin.
2021-10-17 00:18:45.628480: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608): Total Chunks: 24, Chunks in use: 24. 288.00MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2021-10-17 00:18:45.637872: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216): Total Chunks: 48, Chunks in use: 48. 768.00MiB allocated for chunks. 768.00MiB in use in bin. 768.00MiB client-requested in use in bin.
2021-10-17 00:18:45.651217: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.663622: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.677210: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728): Total Chunks: 5, Chunks in use: 5. 995.43MiB allocated for chunks. 995.43MiB in use in bin. 981.58MiB client-requested in use in bin.
2021-10-17 00:18:45.686363: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.701152: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 196.32MiB was 128.00MiB, Chunk State:
2021-10-17 00:18:45.710829: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 2258055936
2021-10-17 00:18:45.715322: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600000 of size 1280 next 1
2021-10-17 00:18:45.727700: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600500 of size 12582912 next 2
2021-10-17 00:18:45.735730: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b200500 of size 12288 next 3
2021-10-17 00:18:45.745330: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b203500 of size 16384 next 4
2021-10-17 00:18:45.757304: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b207500 of size 4096 next 5
2021-10-17 00:18:45.777662: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b208500 of size 16777216 next 6
...goes on for a while like this
2021-10-17 00:18:49.046582: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a3e00 of size 12288 next 318
2021-10-17 00:18:49.056312: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a6e00 of size 205852672 next 313
2021-10-17 00:18:49.063244: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b778f7e00 of size 205852672 next 319
2021-10-17 00:18:49.069964: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b83d48e00 of size 220374272 next 18446744073709551615
2021-10-17 00:18:49.076724: I tensorflow/core/common_runtime/bfc_allocator.cc:1065] Summary of in-use Chunks by size:
2021-10-17 00:18:49.085663: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 15 Chunks of size 256 totalling 3.8KiB
2021-10-17 00:18:49.092613: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-10-17 00:18:49.101615: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 148 Chunks of size 4096 totalling 592.0KiB
2021-10-17 00:18:49.109453: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 12288 totalling 300.0KiB
2021-10-17 00:18:49.118227: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 16384 totalling 384.0KiB
2021-10-17 00:18:49.125224: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 201216 totalling 786.0KiB
2021-10-17 00:18:49.134291: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 4194304 totalling 100.00MiB
2021-10-17 00:18:49.142594: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 12582912 totalling 288.00MiB
2021-10-17 00:18:49.150332: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 48 Chunks of size 16777216 totalling 768.00MiB
2021-10-17 00:18:49.159611: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 205852672 totalling 785.27MiB
2021-10-17 00:18:49.166664: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 220374272 totalling 210.17MiB
2021-10-17 00:18:49.175719: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 2.10GiB
2021-10-17 00:18:49.179917: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 2258055936 memory_limit_: 2258055988 available bytes: 52 curr_region_allocation_bytes_: 4516112384
2021-10-17 00:18:49.186738: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 2258055988
InUse: 2258055424
MaxInUse: 2258055424
NumAllocs: 326
MaxAllocSize: 220374272
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2021-10-17 00:18:49.214161: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2021-10-17 00:18:49.224793: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at resource_variable_ops.cc:158 : Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-10-17 00:18:49.234240: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.0KiB (rounded to 4096)requested by op sample_sequence/model/h0/attn/split
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-10-17 00:18:49.253961: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-10-17 00:18:49.260477: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256): Total Chunks: 15, Chunks in use: 15. 3.8KiB allocated for chunks. 3.8KiB in use in bin. 60B client-requested in use in bin.
2021-10-17 00:18:49.267677: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512): Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.274584: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-10-17 00:18:49.282179: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.291707: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096): Total Chunks: 148, Chunks in use: 148. 592.0KiB allocated for chunks. 592.0KiB in use in bin. 592.0KiB client-requested in use in bin.
2021-10-17 00:18:49.299699: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192): Total Chunks: 25, Chunks in use: 25. 300.0KiB allocated for chunks. 300.0KiB in use in bin. 300.0KiB client-requested in use in bin.
2021-10-17 00:18:49.309406: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384): Total Chunks: 24, Chunks in use: 24. 384.0KiB allocated for chunks. 384.0KiB in use in bin. 384.0KiB client-requested in use in bin.
2021-10-17 00:18:49.316823: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.323705: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.330699: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072): Total Chunks: 4, Chunks in use: 4. 786.0KiB allocated for chunks. 786.0KiB in use in bin. 785.3KiB client-requested in use in bin.
2021-10-17 00:18:49.341079: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.347442: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.355050: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.362441: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.373022: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304): Total Chunks: 25, Chunks in use: 25. 100.00MiB allocated for chunks. 100.00MiB in use in bin. 100.00MiB client-requested in use in bin.
2021-10-17 00:18:49.379516: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608): Total Chunks: 24, Chunks in use: 24. 288.00MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2021-10-17 00:18:49.386849: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216): Total Chunks: 48, Chunks in use: 48. 768.00MiB allocated for chunks. 768.00MiB in use in bin. 768.00MiB client-requested in use in bin.
2021-10-17 00:18:49.394833: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.406519: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.413489: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728): Total Chunks: 5, Chunks in use: 5. 995.43MiB allocated for chunks. 995.43MiB in use in bin. 981.58MiB client-requested in use in bin.
2021-10-17 00:18:49.423166: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.433375: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 4.0KiB was 4.0KiB, Chunk State:
2021-10-17 00:18:49.439983: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 2258055936
2021-10-17 00:18:49.446385: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600000 of size 1280 next 1
2021-10-17 00:18:49.453157: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600500 of size 12582912 next 2
...etc, etc...
2021-10-17 00:18:52.034032: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a3e00 of size 12288 next 318
2021-10-17 00:18:52.041039: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a6e00 of size 205852672 next 313
2021-10-17 00:18:52.050136: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b778f7e00 of size 205852672 next 319
2021-10-17 00:18:52.057217: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b83d48e00 of size 220374272 next 18446744073709551615
2021-10-17 00:18:52.066414: I tensorflow/core/common_runtime/bfc_allocator.cc:1065] Summary of in-use Chunks by size:
2021-10-17 00:18:52.074512: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 15 Chunks of size 256 totalling 3.8KiB
2021-10-17 00:18:52.083562: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-10-17 00:18:52.091067: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 148 Chunks of size 4096 totalling 592.0KiB
2021-10-17 00:18:52.097600: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 12288 totalling 300.0KiB
2021-10-17 00:18:52.105189: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 16384 totalling 384.0KiB
2021-10-17 00:18:52.114193: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 201216 totalling 786.0KiB
2021-10-17 00:18:52.121798: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 4194304 totalling 100.00MiB
2021-10-17 00:18:52.131072: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 12582912 totalling 288.00MiB
2021-10-17 00:18:52.138520: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 48 Chunks of size 16777216 totalling 768.00MiB
2021-10-17 00:18:52.145005: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 205852672 totalling 785.27MiB
2021-10-17 00:18:52.151508: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 220374272 totalling 210.17MiB
2021-10-17 00:18:52.160622: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 2.10GiB
2021-10-17 00:18:52.165037: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 2258055936 memory_limit_: 2258055988 available bytes: 52 curr_region_allocation_bytes_: 4516112384
2021-10-17 00:18:52.174756: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 2258055988
InUse: 2258055424
MaxInUse: 2258055424
NumAllocs: 326
MaxAllocSize: 220374272
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2021-10-17 00:18:52.197768: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2021-10-17 00:18:52.207819: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at split_op.cc:308 : Resource exhausted: OOM when allocating tensor with shape[1,1,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1451, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[strided_slice/_645]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\"username"\Desktop\gpt-2-finetuning\src\generate_unconditional_samples.py", line 79, in <module>
fire.Fire(sample_model)
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\"username"\Desktop\gpt-2-finetuning\src\generate_unconditional_samples.py", line 71, in sample_model
out = sess.run(output)
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1368, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[strided_slice/_645]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
不确定为什么会发生这种情况,并认为我错误地安装了 cudnn 文件。搞砸了一会儿,发现当我从 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin 中删除 cudnn64_8.dll 时,我被告知要复制它,然后运行无条件样本,GPT2 工作很好,并且能够生成一些文本。所有其他 cudnn 文件仍在其 CUDA 目录中。不知道为什么包含 cudnn64_8.dll 会搞砸。我是否安装了错误版本的 CUDA?这里到底发生了什么?
编辑:
所以我决定TF_GPU_ALLOCATOR=cuda_malloc_async
按照上面的终端建议添加到环境变量中。这次我没有像上次一样收到OOM错误,但它也终止了程序。结果如下:
Microsoft Windows [Version 10.0.19043.1288]
(c) Microsoft Corporation. All rights reserved.
C:\Users\"username">cd C:\Users\"username"\Desktop\gpt-2-finetuning\src
C:\Users\"username"\Desktop\gpt-2-finetuning\src>python generate_unconditional_samples.py --model_name novel
2021-10-17 15:20:12.172740: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-17 15:20:12.681534: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:215] Using CUDA malloc Async allocator for GPU: 0
C:\Users\"username"\Desktop\gpt-2-finetuning\src>
我在这里到底做错了什么?为什么我的 GPU 内存不足?