我创建了一个简单的 Python 脚本(使用 Theano)来执行应该在 GPU 上运行的线性回归。当代码启动时,它显示“使用 gpu 设备”,但是(根据分析器)所有操作都是特定于 CPU 的(ElemWise,而不是 GpuElemWise,没有 GpuFromHost 等)。
我检查了变量 THEANO_FLAGS,一切似乎都是正确的,但我看不到任何问题(尤其是当具有相同设置的 Theano 教程在 GPU 上正确运行时 :))。
这是代码:
# linear regression
import numpy
import theano
import theano.tensor as T
input_data = numpy.matrix([[28, 1], [35, 2], [18, 1], [56, 2], [80, 3]])
output_data = numpy.matrix([1600, 2100, 1400, 2500, 3200])
TS = theano.shared(input_data, "training-set")
E = theano.shared(output_data, "expected")
W1 = theano.shared(numpy.zeros((1, 2)))
O = T.dot(TS, W1.T)
cost = T.mean(T.sqr(E - O.T))
gradient = T.grad(cost=cost, wrt=W1)
update = [[W1, W1 - gradient * 0.0001]]
train = theano.function([], cost, updates=update, allow_input_downcast=True)
for i in range(1000):
train()
- THEANO_FLAGS=cuda.root=/usr/local/cuda
- 设备=gpu
- 浮动X=浮动32
- lib.cnmem=.5
- 配置文件=真
- CUDA_LAUNCH_BLOCKING=1
输出:
Using gpu device 0: GeForce GT 650M (CNMeM is enabled)
Function profiling
==================
Message: /home/mw/Documents/LiClipse Workspace/theano1/test2.py:18
Time in 1000 calls to Function.__call__: 3.348637e-02s
Time in Function.fn.__call__: 2.419019e-02s (72.239%)
Time in thunks: 1.839781e-02s (54.941%)
Total compile time: 1.350801e-01s
Number of Apply nodes: 18
Theano Optimizer time: 1.101730e-01s
Theano validate time: 2.029657e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 1.491690e-02s
Import time 2.320528e-03s
Time in all call to theano.grad() 8.740902e-03s
Time since theano import 0.881s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
71.7% 71.7% 0.013s 6.59e-06s Py 2000 2 theano.tensor.basic.Dot
12.3% 83.9% 0.002s 3.22e-07s C 7000 7 theano.tensor.elemwise.Elemwise
5.7% 89.6% 0.001s 3.50e-07s C 3000 3 theano.tensor.elemwise.DimShuffle
4.0% 93.6% 0.001s 3.65e-07s C 2000 2 theano.tensor.subtensor.Subtensor
3.6% 97.2% 0.001s 3.31e-07s C 2000 2 theano.compile.ops.Shape_i
1.7% 98.9% 0.000s 3.06e-07s C 1000 1 theano.tensor.opt.MakeVector
1.1% 100.0% 0.000s 2.10e-07s C 1000 1 theano.tensor.elemwise.Sum
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
71.7% 71.7% 0.013s 6.59e-06s Py 2000 2 dot
4.0% 75.6% 0.001s 3.65e-07s C 2000 2 Subtensor{int64}
3.5% 79.1% 0.001s 6.35e-07s C 1000 1 InplaceDimShuffle{1,0}
3.3% 82.4% 0.001s 6.06e-07s C 1000 1 Elemwise{mul,no_inplace}
2.4% 84.8% 0.000s 4.38e-07s C 1000 1 Shape_i{0}
2.3% 87.1% 0.000s 4.29e-07s C 1000 1 Elemwise{Composite{((i0 * i1) / i2)}}
2.3% 89.3% 0.000s 2.08e-07s C 2000 2 InplaceDimShuffle{x,x}
1.8% 91.1% 0.000s 3.25e-07s C 1000 1 Elemwise{Cast{float64}}
1.7% 92.8% 0.000s 3.06e-07s C 1000 1 MakeVector{dtype='int64'}
1.5% 94.3% 0.000s 2.78e-07s C 1000 1 Elemwise{Composite{(i0 - (i1 * i2))}}[(0, 0)]
1.4% 95.7% 0.000s 2.53e-07s C 1000 1 Elemwise{Sub}[(0, 1)]
1.2% 96.9% 0.000s 2.24e-07s C 1000 1 Shape_i{1}
1.1% 98.0% 0.000s 2.10e-07s C 1000 1 Sum{acc_dtype=float64}
1.1% 99.1% 0.000s 1.98e-07s C 1000 1 Elemwise{Sqr}[(0, 0)]
0.9% 100.0% 0.000s 1.66e-07s C 1000 1 Elemwise{Composite{((i0 / i1) / i2)}}[(0, 0)]
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
37.8% 37.8% 0.007s 6.95e-06s 1000 3 dot(<TensorType(float64, matrix)>, training-set.T)
33.9% 71.7% 0.006s 6.24e-06s 1000 14 dot(Elemwise{Composite{((i0 * i1) / i2)}}.0, training-set)
3.5% 75.1% 0.001s 6.35e-07s 1000 0 InplaceDimShuffle{1,0}(training-set)
3.3% 78.4% 0.001s 6.06e-07s 1000 11 Elemwise{mul,no_inplace}(InplaceDimShuffle{x,x}.0, InplaceDimShuffle{x,x}.0)
3.0% 81.4% 0.001s 5.58e-07s 1000 8 Subtensor{int64}(Elemwise{Cast{float64}}.0, Constant{1})
2.4% 83.8% 0.000s 4.38e-07s 1000 2 Shape_i{0}(expected)
2.3% 86.2% 0.000s 4.29e-07s 1000 12 Elemwise{Composite{((i0 * i1) / i2)}}(TensorConstant{(1, 1) of -2.0}, Elemwise{Sub}[(0, 1)].0, Elemwise{mul,no_inplace}.0)
1.8% 87.9% 0.000s 3.25e-07s 1000 6 Elemwise{Cast{float64}}(MakeVector{dtype='int64'}.0)
1.7% 89.6% 0.000s 3.06e-07s 1000 4 MakeVector{dtype='int64'}(Shape_i{0}.0, Shape_i{1}.0)
1.6% 91.2% 0.000s 3.03e-07s 1000 10 InplaceDimShuffle{x,x}(Subtensor{int64}.0)
1.5% 92.7% 0.000s 2.78e-07s 1000 16 Elemwise{Composite{(i0 - (i1 * i2))}}[(0, 0)](<TensorType(float64, matrix)>, TensorConstant{(1, 1) of ..974738e-05}, dot.0)
1.4% 94.1% 0.000s 2.53e-07s 1000 5 Elemwise{Sub}[(0, 1)](expected, dot.0)
1.2% 95.3% 0.000s 2.24e-07s 1000 1 Shape_i{1}(expected)
1.1% 96.5% 0.000s 2.10e-07s 1000 15 Sum{acc_dtype=float64}(Elemwise{Sqr}[(0, 0)].0)
1.1% 97.6% 0.000s 1.98e-07s 1000 13 Elemwise{Sqr}[(0, 0)](Elemwise{Sub}[(0, 1)].0)
0.9% 98.5% 0.000s 1.72e-07s 1000 7 Subtensor{int64}(Elemwise{Cast{float64}}.0, Constant{0})
0.9% 99.4% 0.000s 1.66e-07s 1000 17 Elemwise{Composite{((i0 / i1) / i2)}}[(0, 0)](Sum{acc_dtype=float64}.0, Subtensor{int64}.0, Subtensor{int64}.0)
0.6% 100.0% 0.000s 1.13e-07s 1000 9 InplaceDimShuffle{x,x}(Subtensor{int64}.0)
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)