我尝试使用 GPyTorch 版本在 sklearn 实现中复制 GP 回归的解决方案。不幸的是,我不能举一个原始数据集的例子,这是专有的。sklearn 解决方案的测试 mse 始终比 pytorch 版本低 40%,但我对 Torch 也不是很熟悉。还有另一个问题处理类似的问题,它似乎是关于多输出 GP 回归的特定问题,这似乎不适用于这里。
我尝试使用与原始数据完全不同的人工数据集来复制问题:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse
import numpy as np
import matplotlib.pyplot as plt
train_size = 1000
test_size = 200
X_train = np.random.normal(loc = 0, scale = 5, size = (train_size, 6))
y_train = X_train[:,0] + 2*X_train[:,1]-3*X_train[:,2]+4*X_train[:,3] - 5*X_train[:,4] + 6*X_train[:,5] + np.random.normal(loc = 0, scale = 0.3, size = train_size)
X_test = np.random.normal(loc = 0, scale = 5, size = (test_size, 6))
y_test = X_test[:,0] + 2*X_test[:,1]-3*X_test[:,2]+4*X_test[:,3] - 5*X_test[:,4] + 6*X_test[:,5] + np.random.normal(loc = 0, scale = 0.3, size = test_size)
naive_prediction = y_train.mean()
basic_mse = mse(y_test, np.repeat(naive_prediction, test_size) )
# Sklearn GP regression
import sklearn.gaussian_process as gp
kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))
model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=False)
model.fit(X_train, y_train)
params = model.kernel_.get_params()
sk_pred, std = model.predict(X_test, return_std=True)
sk_mse = mse(sk_pred, y_test)
print('Sklearn MSE reduction : ', sk_mse/basic_mse)
print('Sklearn Lengthscale: ', params['k2__length_scale'])
# GPytorch Regression
import torch
import gpytorch
X_train_tensor = torch.Tensor(X_train)
y_train_tensor = torch.Tensor(y_train)
X_test_tensor = torch.Tensor(X_test)
y_test_tensor = torch.Tensor(y_test)
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(X_train_tensor, y_train_tensor, likelihood)
training_iter = 1000
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.1)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
all_losses = []
for i in range(training_iter):
# Zero gradients from previous iteration
optimizer.zero_grad()
# Output from model
output = model(X_train_tensor)
# Calc loss and backprop gradients
loss = -mll(output, y_train_tensor)
loss.backward()
all_losses.append(loss.item())
optimizer.step()
plt.plot(all_losses)
model.eval()
likelihood.eval()
with torch.no_grad(), gpytorch.settings.fast_pred_var():
prediction = likelihood(model(X_test_tensor))
mean = prediction.mean
# Get lower and upper predictive bounds
lower, upper = prediction.confidence_region()
noise = model.likelihood.noise.item()
length_scale = model.covar_module.base_kernel.lengthscale.item()
torch_mse = mse(mean.numpy(), y_test)
print('Torch MSE reduction : ', torch_mse/basic_mse)
print('Torch Lengthscale: ', length_scale)
print('Torch Noise: ', noise)
顺便说一下,在这个例子中,GPytorch 效果更好。更具体地说,我试图了解噪声方差的差异。通过尝试,我发现 alpha 参数,它本质上设置了协方差对角线上的噪声水平,对测试 mse 的好坏有很大的影响。如果我理解正确的话,就是这个参数在通过反向传播推断的火炬中。除此之外,如果允许 torch 变体充分放松为一个好的解决方案,这两个内核看起来是相同的。
我的问题是:
如何在 pytorch 内核上设置和固定噪声协方差?
推断的长度尺度在不同的实现之间似乎有很大的不同,这在原始实现的专有数据集上尤其如此。这怎么解释?
还有其他因素可以解释性能差异吗?数据的哪些方面会放大这种差异?