我正在尝试模拟一个脉动阵列结构——所有这些都是我从这些幻灯片中学到的:http : //web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrix-multiplication-systolic.pdf——用于 Python 环境中的矩阵乘法。脉动阵列的一个组成部分是 PE 之间的数据流与发生在任何一个节点上的任何乘法或加法并行。我很难推测如何在 Python 中准确地实现这样的并发过程。具体来说,我希望更好地理解一种计算方法,以级联方式将要相乘的矩阵元素馈送到收缩阵列中,同时允许这些元素以并发方式通过阵列传播。
我已经开始在 python 中编写一些代码到多个两个 3 x 3 数组,但最终,我想模拟任何大小的脉动数组以处理任何大小的 a 和 b 矩阵。
from threading import Thread
from collections import deque
vals_deque = deque(maxlen=9*2)#will hold the interconnections between nodes of the systolicarray
dump=deque(maxlen=9) # will be the output of the SystolicArray
prev_size = 0
def setupSystolicArray():
global SystolicArray
SystolicArray = [NodeSystolic(i,j) for i in range(3), for i in range(3)]
def spreadInputs(a,b):
#needs some way to initially propagate the elements of a and b through the top and leftmost parts of the systolic array
new = map(lambda x: x.start() , SystolicArray) #start all the nodes of the systolic array, they are waiting for an input
#even if i found a way to put these inputs into the array at the start, I am not sure how to coordinate future inputs into the array in the cascading fashion described in the slides
while(len(dump) != 9):
if(len(vals_deque) != prev_size):
vals = vals_deque[-1]
row = vals['t'][0]
col = vals['l'][0]
a= vals['t'][1]
b = vals['l'][1]
# these if elif statements track whether the outputs are at the edge of the systolic array and can be removed
if(row >= 3):
dump.append(a)
elif(col >= 3):
dump.append(b)
else:
#something is wrong with the logic here
SystolicArray[t][l-1].update(a,b)
SystolicArray[t-1][l].update(a,b)
class NodeSystolic:
def __init__(self,row, col):
self.row = row
self.col = col
self.currval = 0
self.up = False
self.ain = 0#coming in from the top
self.bin = 0#coming in from the side
def start(self):
Thread(target=self.continuous, args = ()).start()
def continuous(self):
while True:
if(self.up = True):
self.currval = self.ain*self.bin
self.up = False
self.passon(self.ain, self.bin)
else:
pass
def update(self, left, top):
self.up = True
self.ain = top
self.bin = left
def passon(self, t,l):
#this will passon the inputs this node has received onto the next nodes
vals_deque.append([{'t': [self.row+ 1, self.ain], 'l': [self.col + 1, self.bin]}])
def returnValue(self):
return self.currval
def main():
a = np.array([
[1,2,3],
[4,5,6],
[7,8,9],
])
b = np.array([
[1,2,3],
[4,5,6],
[7,8,9]
])
setupSystolicArray()
spreadInputs(a,b)
上面的代码无法运行,仍然有很多错误。我希望有人能给我指点如何改进代码,或者是否有更简单的方法来模拟具有 Python 中的异步属性的脉动数组的并行过程,所以对于非常大的脉动数组大小,我会的不必担心创建太多线程(节点)。