我在 numpy 中有以下数据集
indices | real data (X) |targets (y)
| |
0 0 | 43.25 665.32 ... |2.4 } 1st block
0 0 | 11.234 |-4.5 }
0 1 ... ... } 2nd block
0 1 }
0 2 } 3rd block
0 2 }
1 0 } 4th block
1 0 }
1 0 }
1 1 ...
1 1
1 2
1 2
2 0
2 0
2 1
2 1
2 1
...
论文是我的变量
idx1 = data[:,0]
idx2 = data[:,1]
X = data[:,2:-1]
y = data[:,-1]
我还有一个变量W
,它是一个 3D 数组。
我需要在代码中做的是遍历数据集中的所有块,并在一些计算后为每个块返回一个标量数,然后对所有标量求和,并将其存储在一个名为cost
. 问题是循环实现非常慢,所以如果可能的话,我会尝试将它矢量化。这是我当前的代码。在numpy中没有for循环是否可以做到这一点?
IDX1 = 0
IDX2 = 1
# get unique indices
idx1s = np.arange(len(np.unique(data[:,IDX1])))
idx2s = np.arange(len(np.unique(data[:,IDX2])))
# initialize global sum variable to 0
cost = 0
for i1 in idx1s:
for i2 in idx2:
# for each block in the dataset
mask = np.nonzero((data[:,IDX1] == i1) & (data[:,IDX2] == i2))
# get variables for that block
curr_X = X[mask,:]
curr_y = y[mask]
curr_W = W[:,i2,i1]
# calculate a scalar
pred = np.dot(curr_X,curr_W)
sigm = 1.0 / (1.0 + np.exp(-pred))
loss = np.sum((sigm- (0.5)) * curr_y)
# add result to global cost
cost += loss
这是一些示例数据
data = np.array([[0,0,5,5,7],
[0,0,5,5,7],
[0,1,5,5,7],
[0,1,5,5,7],
[1,0,5,5,7],
[1,1,5,5,7]])
W = np.zeros((2,2,2))
idx1 = data[:,0]
idx2 = data[:,1]
X = data[:,2:-1]
y = data[:,-1]