我正在训练一个小批量梯度下降模型,以便与大约 0.00016 的直接解 rmse 收敛。有效数据集(函数中的 rmse_valid_array)的 RMSE 的输出在第一个时期很好,但是在几个时期之后,它开始爆炸,我已经为此苦苦挣扎了好几天,算法似乎很好,其中可能是问题吗?
PS x_train的shape为(11000, 41),y_train的shape为(11000, 1),这里的batch size为1,学习率为0.001。我将权重初始化为非常小(除以 1000)。我检查了 X_mini 和 y_mini 是正常的,并且在几个 epoch 之后 graident 开始爆炸。
当我将梯度计算从1/len(y)
(每个批次的大小)更改为1/m
(整个训练集的大小)时,每个 epoch 的 rmse 确实变小了,但不像 Andrew Ng 在他的迷你批次讲座中提到的那样趋势。
[0.003352938483114684,
0.014898628026733278,
0.015708125817549583,
0.15904084037991562,
0.9772361042313762,
17.776216375980052,
187.04333942512542,
978.648663972064,
17383.631549616875,
103997.59758713894,
2222088.2561604036,
23334640.70860544,
118182306.23839562,
2606049599.35717,
18920677325.736164,
261342486636.4693,
1738434547629.957,
10577420781634.316,
164217272049684.75,
1131726496072944.8,
1.6219370161174172e+16,
2.4623815536311107e+17,
...
这是进行小批量的主要功能
def mini_batch_GD(X_train, X_valid, y_train, y_valid, batch_size, lr, CT):
m = len(y_train)
n = X_train.shape[1]
# initialize weight
w = (np.random.random(n)).reshape(1, -1)/1000
rmse_train_array = []
rmse_valid_array = []
time_epoch = []
for epoch in range(0, 100):
start_time = time.time()
# shuffle batches
mini_batches = create_minibatches(X_train, y_train, batch_size)
for mini_batch in mini_batches:
X_mini, y_mini = mini_batch
y_pred = np.dot(X_mini, w.T).reshape(-1, 1)
# t = np.array(y_mini).reshape(-1, 1)
gradient = (1/len(y_pred) * np.dot(X_mini.T, y_pred - y_mini)).reshape(1, -1)
w = w - lr * gradient
# training rmse
y_pred_train = np.dot(X_train, w.T).reshape(-1, 1)
rmse_train_array.append(rmse(y_pred_train, y_train))
# valid rmse
y_pred_valid = np.dot(X_valid, w.T).reshape(-1, 1)
rmse_valid_array.append(rmse(y_pred_valid, y_valid))
# time for each epoch
time_epoch.append(time.time() - start_time)
# check for convergence
if rmse(y_pred_valid, y_valid) <= CT:
break
return w, rmse_train_array, rmse_valid_array, time_epoch
辅助函数创建小批量和 rmse 如下
def create_minibatches(X, y, batch_size):
data = np.hstack((X, y))
np.random.shuffle(data)
n_minibatches = data.shape[0]
i = 0
batch_size = 2
mini_batches = []
for i in range(n_minibatches // batch_size):
mini_batch = data[i * batch_size:(i + 1) * batch_size, :]
X_mini = mini_batch[:, :-1]
y_mini = mini_batch[:, -1].reshape((-1,1))
mini_batches.append((X_mini, y_mini))
if data.shape[0] % batch_size != 0:
mini_batch = data[i * batch_size : data.shape[0]]
X_mini = mini_batch[:, :-1]
y_mini = mini_batch[:, -1].reshape((-1, 1))
mini_batches.append((X_mini, y_mini))
return mini_batches
def rmse(yPred, y):
return np.sqrt(mean_squared_error(yPred, y))