当使用这样的东西时
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X,y)
predictions = clf.predict_proba(X_test)
如何将预测仅限于一类?出于性能原因,这是必需的,例如,当我有数千个类时,但只对一个特定类是否具有高概率感兴趣。
当使用这样的东西时
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X,y)
predictions = clf.predict_proba(X_test)
如何将预测仅限于一类?出于性能原因,这是必需的,例如,当我有数千个类时,但只对一个特定类是否具有高概率感兴趣。
Sklearn 没有实现它,你将不得不编写某种包装器,例如 - 你可以extend
类KNeighborsClassifier
和重载predict_proba
方法。
根据源代码
def predict_proba(self, X):
"""Return probability estimates for the test data X.
Parameters
----------
X : array, shape = (n_samples, n_features)
A 2-D array representing the test points.
Returns
-------
p : array of shape = [n_samples, n_classes], or a list of n_outputs
of such arrays if n_outputs > 1.
The class probabilities of the input samples. Classes are ordered
by lexicographic order.
"""
X = atleast2d_or_csr(X)
neigh_dist, neigh_ind = self.kneighbors(X)
classes_ = self.classes_
_y = self._y
if not self.outputs_2d_:
_y = self._y.reshape((-1, 1))
classes_ = [self.classes_]
n_samples = X.shape[0]
weights = _get_weights(neigh_dist, self.weights)
if weights is None:
weights = np.ones_like(neigh_ind)
all_rows = np.arange(X.shape[0])
probabilities = []
for k, classes_k in enumerate(classes_):
pred_labels = _y[:, k][neigh_ind]
proba_k = np.zeros((n_samples, classes_k.size))
# a simple ':' index doesn't work right
for i, idx in enumerate(pred_labels.T): # loop is O(n_neighbors)
proba_k[all_rows, idx] += weights[:, i]
# normalize 'votes' into real [0,1] probabilities
normalizer = proba_k.sum(axis=1)[:, np.newaxis]
normalizer[normalizer == 0.0] = 1.0
proba_k /= normalizer
probabilities.append(proba_k)
if not self.outputs_2d_:
probabilities = probabilities[0]
return probabilities
只需修改代码,将for k, classes_k in enumerate(classes_):
循环更改为您需要的一个特定类的分类。
一种人为的方法是覆盖classes_
变量,使其成为考虑类的单例,并在完成后恢复它。