我试图从属性中resnet18
存在的正常模型推断结果。torchvision.models
该模型仅在FP32上进行了简单的训练,没有任何混合精度学习。但是,我想在推理时获得更快的结果,所以我torch.cuda.amp.autocast()
只在运行测试推理用例时启用了函数。
下面给出了相同的代码 -
model = torchvision.models.resnet18()
model = model.to(device) # Pushing to GPU
# Train the model normally
没有amp
-
tensor = torch.rand(1,3,32,32).to(device) # Random tensor for testing
with torch.no_grad():
model.eval()
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
model(tensor) # warmup
model(tensor) # warmpup
start.record()
for i in range(20): # total time over 20 iterations
model(tensor)
end.record()
torch.cuda.synchronize()
print('execution time in milliseconds: {}'. format(start.elapsed_time(end)/20))
execution time in milliseconds: 5.264944076538086
与amp
-
tensor = torch.rand(1,3,32,32).to(device)
with torch.no_grad():
model.eval()
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
model(tensor)
model(tensor)
start.record()
with torch.cuda.amp.autocast(): # autocast initialized
for i in range(20):
model(tensor)
end.record()
torch.cuda.synchronize()
print('execution time in milliseconds: {}'. format(start.elapsed_time(end)/20))
execution time in milliseconds: 10.619884490966797
显然,autocast()
启用的代码需要双倍的时间。甚至,对于较大的模型,如resnet50
,时间变化也大致相同。
有人可以帮我解决这个问题吗?我在Google Colab上运行这个例子,下面是 GPU 的规格
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 43C P0 28W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
torch.version.cuda == 10.1
torch.__version__ == 1.8.1+cu101