machine-learning - 使用多项式基函数进行回归时，小批量梯度下降不起作用

Question

我正在尝试y = sin(x)使用小批量梯度下降法实现函数的线性回归。然而，损失并没有像预期的那样下降，而是迅速超出边界并变为nan，导致回归失败。

下面是我的代码和解释。

导入库：

import torch
import math
import torch.nn as nn
import random

定义多项式基函数：

def linear_basis_func(x, power):
    return x ** power

从 x 和系数预测，M 是使用的维度：

def predict(coef, x, M):
    X = torch.cat([linear_basis_func(x, i) for i in range(M)], 1)
    return X, X @ coef

计算预测和预期之间的误差：

def calculate_error(predicted, expected):
    loss = nn.MSELoss()
    error = loss(predicted, expected)
    return error.item()

定义一个随机选择数据的方法，实现 MBGD 方法：

def get_random_batch_index(data, batch_size=32):
    return random.sample(range(data.shape[0]), batch_size)


def MBGD(x, y, dimension, accept_loss=0.001, learn_rate=0.01, max_iter=100000):
    coef = torch.rand(dimension, 1)
    count = 0
    loss_list = []

    for _ in range(max_iter):
        random_index = get_random_batch_index(x)
        batch_x = torch.index_select(x, 0, torch.IntTensor(random_index))
        batch_y = torch.index_select(y, 0, torch.IntTensor(random_index))

        X, predicted = predict(coef, batch_x, dimension)
        loss = calculate_error(predicted, batch_y)

        if count % 5 == 0:
            loss_list.append(loss)

        if loss <= accept_loss:
            break

        # Adjust coefficients
        gradient = learn_rate * (torch.transpose(X, 0, 1) @ (X @ coef - batch_y))
        coef = coef - gradient

        count += 1

    return coef, loss, loss_list, count, batch_x, predicted

生成训练数据并应用 MBGD 方法：

# Initiate training data
dataset_size = 1000
x = torch.linspace(0, 2 * math.pi, dataset_size).unsqueeze(1)
y = torch.sin(x) + torch.rand(dataset_size, 1) / 20     # y = sin(x) with random noise

dimension = 10
coef, loss, loss_list, count, batch_x, predicted = MBGD(x, y, dimension)

print(f"{count} iterations performed")
print(f"loss: {loss}")
print(f"Coefficients: {coef.tolist()}")

但是，当运行上面的代码时，输出是一个列表nan：

100000 iterations performed
loss: nan
Coefficients: [[nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan], [nan]]

如果在迭代中打印出前几个gradient，它们类似于以下内容：

tensor([[1.7028e+05],
        [9.8436e+05],
        [5.7417e+06],
        [3.3730e+07],
        [1.9930e+08],
        [1.1833e+09],
        [7.0541e+09],
        [4.2204e+10],
        [2.5330e+11],
        [1.5245e+12]])
tensor([[-4.4531e+17],
        [-2.4941e+18],
        [-1.4216e+19],
        [-8.2263e+19],
        [-4.8228e+20],
        [-2.8590e+21],
        [-1.7107e+22],
        [-1.0317e+23],
        [-6.2626e+23],
        [-3.8221e+24]])
tensor([[2.8374e+30],
        [1.6791e+31],
        [1.0017e+32],
        [6.0166e+32],
        [3.6344e+33],
        [2.2060e+34],
        [1.3445e+35],
        [8.2240e+35],
        [       inf],
        [       inf]])
tensor([[-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf],
        [-inf]])
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]])

这表明梯度非常不稳定，并且上下波动很大。

这个结果让我很困惑，因为如果在其他超参数不变的情况下使用不同的基函数，例如高斯基函数或 Sigmoidal 基函数，这里的 MBGD 方法可以产生很好的结果。我们只需要做一个小的改动predict()：

def gaussian_basis_func(x, mu, sigma=1):
    return torch.exp(-((x - mu) ** 2) / (2 * (sigma ** 2)))


def predict(coef, x, M):
    width = 2 * (x.max() - x.min()) / M
    mu_vector = torch.linspace(x.min(), x.max(), M)

    X = torch.cat([gaussian_basis_func(x, mu_vector[i].item(), sigma=width) for i in range(M)], 1)
    return X, X @ coef

结果要好得多：

2424 iterations performed
loss: 0.0009117514709942043
Coefficients: [[-0.7537871599197388], [0.8054635524749756], [0.30866697430610657], [0.4661775231361389], [0.15132419764995575], [-0.23501889407634735], [-0.15086951851844788], [-0.5047774314880371], [-0.7870860695838928], [0.7650274634361267]]

我尝试过降低学习率等方法，但没有任何区别。

有人可以对此提出任何建议吗？我想知道这是代码实现问题，基本功能问题还是其他问题。

machine-learning - 使用多项式基函数进行回归时，小批量梯度下降不起作用

0 回答 0

Related

Reference