数据集生成如下:
from numpy.random import normal
def make_labels(X, f, noise=0) :
return map(lambda x : f(x) + (normal(0,noise) if noise>0 else 0), X)
def make_instances(x1, x2, N) :
return np.array([np.array([x]) for x in np.linspace(x1,x2,N)])
def f(x):
return 5 + x - 2 * x**2 - 5 * x**3
X = make_instances(-5, 5, 50)
y_map = make_labels(X, f, 200)
y = np.array(list(y_map))
我的任务是编写训练函数,然后在原始集和拆分集上训练模型。这是我的火车功能:
def train(X, y, d):
poly = sklearn.preprocessing.PolynomialFeatures(d)
phi = poly.fit_transform(X)
w = np.matmul(np.linalg.pinv(phi), y)
h = np.matmul(phi, w)
return h
首先,我在这样的原始数据集上训练模型(例如,多项式特征 degree = 3):
h = train(X, y, 3)
结果是这样的:
plt.grid()
plt.scatter(X, y)
plt.plot(X, h, 'r')
但是,当我像这样使用 train_test_split 拆分数据时,以及在训练集上的训练模型之后:
X_train, X_test, y_train, y_test= sklearn.model_selection.train_test_split(X, y, test_size = 0.5)
h_train = train(X_train, y_train, 3)
结果很奇怪:
plt.grid()
plt.scatter(X, y)
plt.plot(X_train, h_train, 'r')