python - Sklearn 自定义内核给出了错误的决策函数

Question

我已经成功实现了我自己的自定义线性内核，使用clf.predict. 但是，当我想使用clf.decision_function它时，它会为所有点提供恒定值。

这是自定义内核的代码：

```
def linear_basis(x, y):
    return np.dot(x.T, y)

def linear_kernel(X, Y, K=linear_basis):
    gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
    for i, x in enumerate(X):
        for j, y in enumerate(Y):
            gram_matrix[i,j] = K(x,y)
        return gram_matrix
```

现在将此内核用于小型线性训练集。

```
#creating random 2D points
sample_size = 100
dat = {
    'x': [random.uniform(-2,2) for i in range(sample_size)],
    'y': [random.uniform(-2,2) for i in range(sample_size)]
}

data = pd.DataFrame(dat)

# giving the random points a linear structure
f_lin = np.vectorize(lambda x, y: 1 if x > y else 0)
data['z_lin'] = f_lin(data['x'].values, data['y'].values)
data_pos = data[data.z_lin == 1.]
data_neg = data[data.z_lin == 0.]

X_train = data[['x', 'y']]
y_train = data[['z_lin']]

clf_custom_lin = svm.SVC(kernel=linear_kernel) # using my custom kernel here
clf_custom_lin.fit(X_train.values,y_train.values)

# creating a 100x100 grid to manually predict each point in 2D
gridpoints = np.array([[i,j] for i in np.linspace(-2,2,100) for j in np.linspace(-2,2,100)])
gridresults = np.array([clf.predict([gridpoints[k]]) for k in range(len(gridpoints))])

# now plotting each point and the training samples
plt.scatter(gridpoints[:,0], gridpoints[:,1], c=gridresults, cmap='RdYlGn')
plt.scatter(data_pos['x'], data_pos['y'], color='green', marker='o', edgecolors='black')
plt.scatter(data_neg['x'], data_neg['y'], color='red', marker='o', edgecolors='black')
plt.show()
```

这给出了以下结果：

现在我想使用以下方法重现情节clf.decision_function：

（！请注意，我在这里不小心切换了颜色！）

```
h = .02
xx, yy = np.meshgrid(np.arange(-2 - .5, 2 + .5, h),
    np.arange(-2 - .5, 2 + .5, h))

# using the .decision_function here
Z = clf_custom_lin.decision_function(np.c_[xx.ravel(), yy.ravel()]) 

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=.8)

plt.scatter(data_pos['x'], data_pos['y'], color='blue', marker='o', edgecolors='black')
plt.scatter(data_neg['x'], data_neg['y'], color='red', marker='o', edgecolors='black')
plt.show()
```

这给出了以下情节：

这是使用集成线性内核 (kernel="linear") 绘制相同数据的示例：

由于自定义内核的预测函数刚刚起作用，它应该与这里的决策函数给出相同的工作图，对吧？我不知道为什么这适用于集成线性函数，但不适用于自定义线性函数，它也适用于仅预测没有决策函数的点。希望有人可以在这里提供帮助。

score 1 · Accepted Answer

实际的问题真的很傻，但是由于花了相当长的时间来追踪，我将分享我的调试大纲。

首先，不是绘图，而是打印decision_function: 你会发现第一个值是唯一的，但在那之后一切都是不变的。在数据集的不同切片上运行相同，这种模式仍然存在。所以我想也许有些值被覆盖了，我深入研究了SVC代码。这导致了一些有用的内部功能/属性，例如._BaseLibSVM__Xfit包含训练数据，_decision_functionand _dense_decision_function，and _compute_kernel。但是没有任何代码表明有问题，运行它们只是显示了同样的问题。运行_compute_kernel给出的结果在第一行之后全为零，然后回到你的代码，运行linear_kernel已经这样做了。所以，最后，它回到了你的linear_kernel功能。

您在外部 for 循环内返回，因此您只使用的第一行X，从不计算矩阵的其余部分。（这带来了一个惊喜：为什么预测看起来不错？这似乎是侥幸。改变的定义f_lin，改变类，模型仍然学习斜率 1 线。）

python - Sklearn 自定义内核给出了错误的决策函数

1 回答 1

Related

Reference