我正在尝试y = sin(x)
使用小批量梯度下降法实现函数的线性回归。然而,损失并没有像预期的那样下降,而是迅速超出边界并变为nan
,导致回归失败。
下面是我的代码和解释。
导入库:
import torch
import math
import torch.nn as nn
import random
定义多项式基函数:
def linear_basis_func(x, power):
return x ** power
从 x 和系数预测,M 是使用的维度:
def predict(coef, x, M):
X = torch.cat([linear_basis_func(x, i) for i in range(M)], 1)
return X, X @ coef
计算预测和预期之间的误差:
def calculate_error(predicted, expected):
loss = nn.MSELoss()
error = loss(predicted, expected)
return error.item()
定义一个随机选择数据的方法,实现 MBGD 方法:
def get_random_batch_index(data, batch_size=32):
return random.sample(range(data.shape[0]), batch_size)
def MBGD(x, y, dimension, accept_loss=0.001, learn_rate=0.01, max_iter=100000):
coef = torch.rand(dimension, 1)
count = 0
loss_list = []
for _ in range(max_iter):
random_index = get_random_batch_index(x)
batch_x = torch.index_select(x, 0, torch.IntTensor(random_index))
batch_y = torch.index_select(y, 0, torch.IntTensor(random_index))
X, predicted = predict(coef, batch_x, dimension)
loss = calculate_error(predicted, batch_y)
if count % 5 == 0:
loss_list.append(loss)
if loss <= accept_loss:
break
# Adjust coefficients
gradient = learn_rate * (torch.transpose(X, 0, 1) @ (X @ coef - batch_y))
coef = coef - gradient
count += 1
return coef, loss, loss_list, count, batch_x, predicted
生成训练数据并应用 MBGD 方法:
# Initiate training data
dataset_size = 1000
x = torch.linspace(0, 2 * math.pi, dataset_size).unsqueeze(1)
y = torch.sin(x) + torch.rand(dataset_size, 1) / 20 # y = sin(x) with random noise
dimension = 10
coef, loss, loss_list, count, batch_x, predicted = MBGD(x, y, dimension)
print(f"{count} iterations performed")
print(f"loss: {loss}")
print(f"Coefficients: {coef.tolist()}")
但是,当运行上面的代码时,输出是一个列表nan
:
100000 iterations performed
loss: nan
Coefficients: [[nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan]]
如果在迭代中打印出前几个gradient
,它们类似于以下内容:
tensor([[1.7028e+05],
[9.8436e+05],
[5.7417e+06],
[3.3730e+07],
[1.9930e+08],
[1.1833e+09],
[7.0541e+09],
[4.2204e+10],
[2.5330e+11],
[1.5245e+12]])
tensor([[-4.4531e+17],
[-2.4941e+18],
[-1.4216e+19],
[-8.2263e+19],
[-4.8228e+20],
[-2.8590e+21],
[-1.7107e+22],
[-1.0317e+23],
[-6.2626e+23],
[-3.8221e+24]])
tensor([[2.8374e+30],
[1.6791e+31],
[1.0017e+32],
[6.0166e+32],
[3.6344e+33],
[2.2060e+34],
[1.3445e+35],
[8.2240e+35],
[ inf],
[ inf]])
tensor([[-inf],
[-inf],
[-inf],
[-inf],
[-inf],
[-inf],
[-inf],
[-inf],
[-inf],
[-inf]])
tensor([[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan]])
这表明梯度非常不稳定,并且上下波动很大。
这个结果让我很困惑,因为如果在其他超参数不变的情况下使用不同的基函数,例如高斯基函数或 Sigmoidal 基函数,这里的 MBGD 方法可以产生很好的结果。我们只需要做一个小的改动predict()
:
def gaussian_basis_func(x, mu, sigma=1):
return torch.exp(-((x - mu) ** 2) / (2 * (sigma ** 2)))
def predict(coef, x, M):
width = 2 * (x.max() - x.min()) / M
mu_vector = torch.linspace(x.min(), x.max(), M)
X = torch.cat([gaussian_basis_func(x, mu_vector[i].item(), sigma=width) for i in range(M)], 1)
return X, X @ coef
结果要好得多:
2424 iterations performed
loss: 0.0009117514709942043
Coefficients: [[-0.7537871599197388], [0.8054635524749756], [0.30866697430610657], [0.4661775231361389], [0.15132419764995575], [-0.23501889407634735], [-0.15086951851844788], [-0.5047774314880371], [-0.7870860695838928], [0.7650274634361267]]
我尝试过降低学习率等方法,但没有任何区别。
有人可以对此提出任何建议吗?我想知道这是代码实现问题,基本功能问题还是其他问题。