为了加快我的代码速度,我将一个多维 sumproduct 函数从 Python 转换为 Theano。我的 Theano 代码达到了相同的结果,但一次只计算一个维度的结果,因此我必须使用 Python for 循环来获得最终结果。我认为这会使代码变慢,因为 Theano 无法优化内存使用和多个函数调用之间的传输(对于 gpu)。或者这是一个错误的假设?
那么如何更改 Theano 代码,以便在一次函数调用中计算 sumprod?
原始 Python 函数:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
result = numpy.zeros_like(a1[0])
for i, j in zip(a1, a2):
result += i*j
return result
对于以下输入
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
输出将是:[ 26. 40. 65.]
即 1*1 + 5*5、2*2 + 6*6 和 4*4 + 7*7
Theano 版本的代码:
import theano
import theano.tensor as T
import numpy
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
# wanted result: [ 26. 40. 65.]
# that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
Tk = T.iscalar('Tk')
Ta1_shared = theano.shared(numpy.array(a1).T)
Ta2_shared = theano.shared(numpy.array(a2).T)
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value:
prior_value + Ta1_shared * Ta2_shared,
outputs_info=outputs_info,
sequences=[Ta1_shared[Tk], Ta2_shared[Tk]])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function([Tk], outputs=Tsumprod_result)
result = numpy.zeros_like(a1[0])
for i in range(len(a1[0])):
result[i] = Tsumprod(i)
print result