gpu - 如何在谷歌 colab 中设置 theano 标志？

Question

我想使用 colab GPU 运行我的 theano 代码，因此我正在尝试为此目的更改 theano 标志。我努力了

import os
os.environ['THEANO_FLAGS'] = """ device=cuda0,force_device=True,blas.ldflags="-L/usr/lib/ -lblas", floatX=float32, mode=FAST_RUN, lib.cnmem=.5, profile=True, CUDA_LAUNCH_BLOCKING=1 """
import theano

和

!printf """[global]\\ndevice = cuda\\nfloatX = float32\\nforce_device=True\\nmode=FAST_RUN\\nlib.cnmem=.5\\nprofile=True\\nCUDA_LAUNCH_BLOCKING=1""" > ~/.theanorc
!cat ~/.theanorc

但他们似乎都没有工作。因为（根据分析器）所有操作都是特定于 CPU 的（ElemWise，而不是 GpuElemWise，没有 GpuFromHost 等）。

我试过这段代码：

import numpy
import theano
import theano.tensor as T


input_data = numpy.matrix([[28, 1], [35, 2], [18, 1], [56, 2], [80, 3]])
output_data = numpy.matrix([1600, 2100, 1400, 2500, 3200])

TS = theano.shared(input_data.astype('float32'), "training-set")
E = theano.shared(output_data.astype('float32'), "expected")
W1 = theano.shared(numpy.zeros((1, 2), dtype = 'float32'))

O = T.dot(TS, W1.T)
cost = T.mean(T.sqr(E - O.T)).astype('float32')
gradient = T.grad(cost=cost, wrt=W1).astype('float32')
update = [[W1, W1 - gradient * numpy.float32(0.0001)]]
train = theano.function([], cost, updates=update, allow_input_downcast=True, profile = True)

for i in range(1000):
    train()

train.profile.summary()

并收到以下输出：


Function profiling
==================
  Message: <ipython-input-20-49bdedf42dbb>:27
  Time in 1000 calls to Function.__call__: 1.391292e-02s
  Time in Function.fn.__call__: 7.742643e-03s (55.651%)
  Time in thunks: 3.543854e-03s (25.472%)
  Total compile time: 5.829549e-02s
    Number of Apply nodes: 16
    Theano Optimizer time: 4.293251e-02s
       Theano validate time: 7.207394e-04s
    Theano Linker time (includes C, CUDA code generation/compiling): 1.048517e-02s
       Import time 0.000000e+00s
       Node make_thunk time 9.668112e-03s
           Node InplaceDimShuffle{x,x}(Subtensor{int64}.0) time 1.002550e-03s
           Node InplaceDimShuffle{1,0}(training-set) time 9.713173e-04s
           Node InplaceDimShuffle{x,x}(Subtensor{int64}.0) time 9.384155e-04s
           Node Gemm{inplace}(<TensorType(float32, matrix)>, TensorConstant{-1e-04}, Elemwise{Composite{((i0 * i1) / i2)}}.0, training-set, TensorConstant{1.0}) time 7.627010e-04s
           Node Gemm{no_inplace}(expected, TensorConstant{-1.0}, <TensorType(float32, matrix)>, training-set.T, TensorConstant{1.0}) time 7.226467e-04s

Time in all call to theano.grad() 2.316711e-01s
Time since theano import 1824.793s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  28.5%    28.5%       0.001s       5.05e-07s     C     2000       2   theano.tensor.blas.Gemm
  20.6%    49.1%       0.001s       1.46e-07s     C     5000       5   theano.tensor.elemwise.Elemwise
  18.5%    67.6%       0.001s       2.18e-07s     C     3000       3   theano.tensor.elemwise.DimShuffle
  12.8%    80.4%       0.000s       4.54e-07s     C     1000       1   theano.tensor.elemwise.Sum
   9.3%    89.7%       0.000s       1.65e-07s     C     2000       2   theano.tensor.subtensor.Subtensor
   6.1%    95.8%       0.000s       1.08e-07s     C     2000       2   theano.compile.ops.Shape_i
   4.2%   100.0%       0.000s       1.50e-07s     C     1000       1   theano.tensor.opt.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  15.9%    15.9%       0.001s       5.65e-07s     C     1000        1   Gemm{no_inplace}
  12.8%    28.8%       0.000s       4.54e-07s     C     1000        1   Sum{acc_dtype=float64}
  12.6%    41.3%       0.000s       4.45e-07s     C     1000        1   Gemm{inplace}
  11.0%    52.3%       0.000s       1.94e-07s     C     2000        2   InplaceDimShuffle{x,x}
   9.3%    61.6%       0.000s       1.65e-07s     C     2000        2   Subtensor{int64}
   7.5%    69.1%       0.000s       2.66e-07s     C     1000        1   InplaceDimShuffle{1,0}
   5.8%    74.9%       0.000s       2.05e-07s     C     1000        1   Elemwise{mul,no_inplace}
   5.4%    80.2%       0.000s       1.91e-07s     C     1000        1   Elemwise{Composite{((i0 * i1) / i2)}}
   5.0%    85.3%       0.000s       1.78e-07s     C     1000        1   Elemwise{Cast{float32}}
   4.2%    89.5%       0.000s       1.50e-07s     C     1000        1   MakeVector{dtype='int64'}
   3.1%    92.6%       0.000s       1.08e-07s     C     1000        1   Shape_i{0}
   3.0%    95.6%       0.000s       1.08e-07s     C     1000        1   Shape_i{1}
   2.2%    97.8%       0.000s       7.94e-08s     C     1000        1   Elemwise{Composite{((i0 / i1) / i2)}}[(0, 0)]
   2.2%   100.0%       0.000s       7.68e-08s     C     1000        1   Elemwise{Sqr}[(0, 0)]
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  15.9%    15.9%       0.001s       5.65e-07s   1000     3   Gemm{no_inplace}(expected, TensorConstant{-1.0}, <TensorType(float32, matrix)>, training-set.T, TensorConstant{1.0})
  12.8%    28.8%       0.000s       4.54e-07s   1000    14   Sum{acc_dtype=float64}(Elemwise{Sqr}[(0, 0)].0)
  12.6%    41.3%       0.000s       4.45e-07s   1000    13   Gemm{inplace}(<TensorType(float32, matrix)>, TensorConstant{-1e-04}, Elemwise{Composite{((i0 * i1) / i2)}}.0, training-set, TensorConstant{1.0})
   7.5%    48.8%       0.000s       2.66e-07s   1000     0   InplaceDimShuffle{1,0}(training-set)
   6.1%    54.9%       0.000s       2.15e-07s   1000     7   Subtensor{int64}(Elemwise{Cast{float32}}.0, Constant{1})
   5.8%    60.6%       0.000s       2.05e-07s   1000    10   Elemwise{mul,no_inplace}(InplaceDimShuffle{x,x}.0, InplaceDimShuffle{x,x}.0)
   5.6%    66.2%       0.000s       1.97e-07s   1000     8   InplaceDimShuffle{x,x}(Subtensor{int64}.0)
   5.4%    71.6%       0.000s       1.92e-07s   1000     9   InplaceDimShuffle{x,x}(Subtensor{int64}.0)
   5.4%    77.0%       0.000s       1.91e-07s   1000    11   Elemwise{Composite{((i0 * i1) / i2)}}(TensorConstant{(1, 1) of -2.0}, Gemm{no_inplace}.0, Elemwise{mul,no_inplace}.0)
   5.0%    82.0%       0.000s       1.78e-07s   1000     5   Elemwise{Cast{float32}}(MakeVector{dtype='int64'}.0)
   4.2%    86.3%       0.000s       1.50e-07s   1000     4   MakeVector{dtype='int64'}(Shape_i{0}.0, Shape_i{1}.0)
   3.2%    89.5%       0.000s       1.15e-07s   1000     6   Subtensor{int64}(Elemwise{Cast{float32}}.0, Constant{0})
   3.1%    92.6%       0.000s       1.08e-07s   1000     2   Shape_i{0}(expected)
   3.0%    95.6%       0.000s       1.08e-07s   1000     1   Shape_i{1}(expected)
   2.2%    97.8%       0.000s       7.94e-08s   1000    15   Elemwise{Composite{((i0 / i1) / i2)}}[(0, 0)](Sum{acc_dtype=float64}.0, Subtensor{int64}.0, Subtensor{int64}.0)
   2.2%   100.0%       0.000s       7.68e-08s   1000    12   Elemwise{Sqr}[(0, 0)](Gemm{no_inplace}.0)
   ... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32

提前感谢您的帮助。

gpu - 如何在谷歌 colab 中设置 theano 标志？

0 回答 0

Related

Reference