3

我有一个 KNeighborsClassifier,它根据 4 个属性对数据进行分类。我想手动加权这 4 个属性,但总是遇到“操作数不能与形状 (1,5) (4) 一起广播”。

weights : [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.关于(从这里)的文档很少

这就是我现在所拥有的:

    for v in result:
        params = [v['a_one'], v['a_two'], v['a_three'], v['a_four']]
        self.training_data['data'].append(params)
        self.training_data['target'].append(v['answer'])

    def get_weights(array_weights):
        return [1,1,2,1]

    classifier = neighbors.KNeighborsClassifier(weights=get_weights)
4

2 回答 2

4

sklearn 权重可调用的解释

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

为模型训练创建样本数据

df = pd.DataFrame({'feature1':[1,3,3,4,5], 'response':[1,1,1,2,2]})

y = df.response
# [1,1,1,2,2]

X_train = df[['feature1']]
# [1,3,3,4,5]

定义自定义距离函数(打印输入数据结构)

def my_distance(weights):
    print(weights)
    return weights

将传入 my_distance 的模型定义为对权重的可调用

knn = KNeighborsClassifier(n_neighbors=3, weights=my_distance)

knn.fit(X_train,y)

knn.predict([[1]])
# [[ 0.  2.  2.]]
# array([1])

说明:显示3个最近邻(n_neighbors=3)到预测值1

X_train 中最接近 1 的三个邻居:

1, 3, 3 

距离:

[[ 0.  2.  2.]]

1 - 1 = 0 
3 - 1 = 2
3 - 1 = 2

预测类:

array([1])
于 2017-10-06T04:38:54.653 回答
0

对于高斯 ##gamma 在这里是一个超参数——我们需要选择最合适的。

def gaussian_kernel(distance):
     weights = np.exp(-gamma*(distance**2))
     return weights/np.sum(weights)
于 2020-11-21T03:57:57.377 回答