deep-learning - nnGraph 多 GPU 火炬

Question

这个问题是关于让任何 nnGraph 网络在多个 GPU 上运行，而不是特定于以下网络实例

我正在尝试训练一个用 nnGraph 构建的网络。附后图。我正在尝试在多 GPU 设置中运行 parallelModel（参见代码或图节点 9）。如果我将并行模型附加到一个 nn.Sequential 容器，然后创建一个 DataParallelTable，它可以在多 GPU 设置中工作（没有 nnGraph）。但是，将其附加到 nnGraph 后，我收到错误消息。如果我在单个 GPU 上训练（在 if 语句中将 true 设置为 false），则后向传递有效，但在多 GPU 设置中，我收到错误“gmodule.lua:418：尝试索引本地'gradInput'（无价值）”。我认为反向传递中的 Node 9 应该在多个 GPU 上运行，但是这并没有发生。在 nnGraph 上创建 DataParallelTable 对我不起作用，但是我认为至少将内部顺序网络放在 DataParallelTable 中会起作用。是否有其他方法可以拆分传递给 nnGraph 的初始数据，以便它在多个 GPU 上运行？

require 'torch'
require 'nn'
require 'cudnn'
require 'cunn'
require 'cutorch'
require 'nngraph'

data1 = torch.ones(4,20):cuda()
data2 = torch.ones(4,10):cuda()

tmodel = nn.Sequential()
tmodel:add(nn.Linear(20,10))
tmodel:add(nn.Linear(10,10))
parallelModel = nn.ParallelTable()
parallelModel:add(tmodel)
parallelModel:add(nn.Identity())
parallelModel:add(nn.Identity())

model = parallelModel
if true then
  local function sharingKey(m)
     local key = torch.type(m)
     if m.__shareGradInputKey then
        key = key .. ':' .. m.__shareGradInputKey
     end
     return key
  end

  -- Share gradInput for memory efficient backprop
  local cache = {}
  model:apply(function(m)
     local moduleType = torch.type(m)
     if torch.isTensor(m.gradInput) and moduleType ~= 'nn.ConcatTable' then
        local key = sharingKey(m)
        if cache[key] == nil then
           cache[key] = torch.CudaStorage(1)
        end
        m.gradInput = torch.CudaTensor(cache[key], 1, 0)
     end
  end)
end

if true then
  cudnn.fastest = true
  cudnn.benchmark = true

  -- Wrap the model with DataParallelTable, if using more than one GPU
  local gpus = torch.range(1, 2):totable()
  local fastest, benchmark = cudnn.fastest, cudnn.benchmark

  local dpt = nn.DataParallelTable(1, true, true)
     :add(model, gpus)
     :threads(function()
        local cudnn = require 'cudnn'
        cudnn.fastest, cudnn.benchmark = fastest, benchmark
     end)
  dpt.gradInput = nil

  model = dpt:cuda()
end


newmodel = nn.Sequential()
newmodel:add(model)

input1 = nn.Identity()()
input2 = nn.Identity()()
input3 = nn.Identity()()

out = newmodel({input1,input2,input3})

r1 = nn.NarrowTable(1,2)(out)
r2 = nn.NarrowTable(2,2)(out)

f1 = nn.JoinTable(2)(r1)
f2 = nn.JoinTable(2)(r2)

n1 = nn.Sequential()
n1:add(nn.Linear(20,5))

n2 = nn.Sequential()
n2:add(nn.Linear(20,5))  

f11 = n1(f1)
f12 = n2(f2)

foutput = nn.JoinTable(2)({f11,f12})

g = nn.gModule({input1,input2,input3},{foutput})
g = g:cuda()


g:forward({data1, data2, data2})
g:backward({data1, data2, data2}, torch.rand(4,10):cuda())

“if”语句中的代码取自Facebook 的 ResNet 实现

deep-learning - nnGraph 多 GPU 火炬

0 回答 0

Related

Reference