python - 如何通过 Resnet、FCn、DeepLab 图像分割评估解决 GPU OOM 问题？

Question

我正在学习有关 Python 中图像分割的 youtube 教程：link

本教程基于我为细化目的而参考的其他教程，特别是这个： OpenCV Pytorch Segmentation

我正在使用具有 8 GB GPU 内存的 NVDIA 2070 显卡。

我的问题是，原始教程通过 FCN 教授了使用 Resnet 的语义分割程序的基本 CPU 实现。我想以此为基础来利用 GPU，所以我找到了后一个教程。我在这方面并没有任何经验，但我想出了如何在 GPU 上运行它并立即遇到 GPU OOM 问题：

运行时错误：CUDA 内存不足。尝试分配 184.00 MiB（GPU 0；8.00 GiB 总容量；5.85 GiB 已分配；26.97 MiB 空闲；PyTorch 总共保留 5.88 GiB）

当我在小图像上运行此程序时，或者将高清图像的图像质量降低到 50% 分辨率时，我不会收到 OOM 错误。

我的戳戳和催促让我相信我的 OOM 是跨此任务分配内存的结果。所以现在我尝试实现替代的 DeepLab 解决方案，希望它能更有效地分配内存，但事实并非如此。

这是我的代码：

from PIL import Image
import torch
import torchvision.transforms as T
from torchvision import models
import numpy as np
import imghdr

fcn = None
dlab = None

def getRotoModel():
    global fcn
    global dlab
    fcn = models.segmentation.fcn_resnet101(pretrained=True).eval()
    dlab = models.segmentation.deeplabv3_resnet101(pretrained=1).eval()

# Define the helper function
def decode_segmap(image, nc=21):

    label_colors = np.array([(0, 0, 0),  # 0=background
                           # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
               (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128),
               # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
               (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
               # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
               (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128),
               # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
               (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])

    r = np.zeros_like(image).astype(np.uint8)
    g = np.zeros_like(image).astype(np.uint8)
    b = np.zeros_like(image).astype(np.uint8)

    for l in range(0, nc):
        idx = image == l
        r[idx] = label_colors[l, 0]
        g[idx] = label_colors[l, 1]
        b[idx] = label_colors[l, 2]

    rgb = np.stack([r, g, b], axis=2)
    return rgb

valid_images = ['jpg','png', 'rgb', 'pbm', 'ppm', 'tiff', 'rast', 'xbm', 'bmp', 'exr', 'jpeg'] #Valid image formats
dev = torch.device('cuda')
def createMatte(filename, matteName, factor):
    if imghdr.what(filename) in valid_images:
        img = Image.open(filename).convert('RGB')
        
        size = img.size
        w, h = size
        modifiedSize = h * factor
        print('Image original size is ', size)
        print('Modified size is ', modifiedSize)
        trf = T.Compose([T.Resize(int(modifiedSize)),
                     T.ToTensor(), 
                     T.Normalize(mean = [0.485, 0.456, 0.406], 
                                 std = [0.229, 0.224, 0.225])])
        inp = trf(img).unsqueeze(0)
        #inp = trf(img).unsqueeze(0).to(dev)
        
        if (fcn == None): getRotoModel()
        
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            inp = inp.to(dev)
            fcn.to(dev)
            out = fcn.to(dev)(inp)['out'][0]
        
        with torch.no_grad():
            out = fcn(inp)['out'][0]
        
        #out = fcn(inp)['out']
        #out = fcn.to(dev)(inp)['out']
        om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()  
        rgb = decode_segmap(om)
        im = Image.fromarray(rgb)
        im.save(matteName)
    else:
        print('File type is not supported for file ' + filename)
        print(imghdr.what(filename))
        
def createDLMatte(filename, matteName, factor):
    if imghdr.what(filename) in valid_images:
        img = Image.open(filename).convert('RGB')
            
        size = img.size
        w, h = size
        modifiedSize = h * factor
        print('Image original size is ', size)
        print('Modified size is ', modifiedSize)
        trf = T.Compose([T.Resize(int(modifiedSize)),
            T.ToTensor(), 
            T.Normalize(mean = [0.485, 0.456, 0.406], 
                        std = [0.229, 0.224, 0.225])])
        inp = trf(img).unsqueeze(0)
        #inp = trf(img).unsqueeze(0).to(dev)
            
        if (dlab == None): getRotoModel()
            
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            inp = inp.to(dev)
            dlab.to(dev)
            out = dlab.to(dev)(inp)['out'][0]
            
        with torch.no_grad():
            out = dlab(inp)['out'][0]
            
        #out = fcn(inp)['out']
        #out = fcn.to(dev)(inp)['out']
        om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()  
        rgb = decode_segmap(om)
        im = Image.fromarray(rgb)
        im.save(matteName)

我想知道的是，是否有解决 GPU 问题的方法？当我拥有一个通常强大的 GPU 时，我不想将自己限制在 CPU 渲染上，每张图像大约需要一分钟。正如我所说，我对其中的大部分内容都很陌生，但我希望有一种方法可以更好地在这个过程中分配内存。

我有一些潜在的解决方案，但我无法找到实施资源。

（糟糕的解决方案）在 GPU 接近内存末尾时限制计算并将任务的其余部分切换到 CPU。我不仅觉得这很糟糕，而且我也没有真正看到如何在任务中实现 GPU CPU 切换。
（更好）通过将图像分割成可管理的位来修复内存分配，并将这些位保存到临时文件中，然后最后将它们组合起来。
两者的某种组合。

现在我担心的是分割图像会降低结果的质量，因为每一块都不会在上下文中，我需要某种智能拼接，这超出了我的工资等级。

所以我通常会问是否有资源来解决这些可能的解决方案，或者是否有更好的解决方案。

最后，我的实现是否有问题导致 GPU OOM 错误？我不知道是我的代码没有优化，还是 DeepLab 和 FCN 都只是超级内存密集型并且从我的角度来看是不可优化的。任何帮助将不胜感激！谢谢！

python - 如何通过 Resnet、FCn、DeepLab 图像分割评估解决 GPU OOM 问题？

0 回答 0

Related

Reference