python - Sklearn GridSearchCV，class_weight 因未知原因无法正常工作:(

Question

试图class_weight开始。我知道其余的代码有效，只是class_weight它给了我错误：

    parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
                                             ^
SyntaxError: invalid syntax

这是我的代码

clf1 = tree.DecisionTreeClassifier()
 parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
 'splitter' : ('best','random'),'max_features':[None,2,4,6,8,10,12,14],'class_weight':{1:10}]
clf=grid_search.GridSearchCV(clf1,parameters_to_tune)
clf.fit(features,labels)
print clf.best_params_

有人发现我犯的错误吗？

score 6 · Accepted Answer

~~我假设您想对class_weight“薪水”类进行不同的网格搜索。~~

的值class_weight应该是一个列表：

'class_weight':[{'salary':1}, {'salary':2}, {'salary':4}, {'salary':6}, {'salary':10}]

您可以使用列表理解来简化它：

'class_weight':[{'salary': w} for w in [1, 2, 4, 6, 10]]

第一个问题是dict中的参数值parameters_to_tune应该是一个列表，而你传递的是一个dict。它可以通过传递一个字典列表作为值来修复，class_weight而不是每个字典都包含一组class_weightfor DecisionTreeClassifier。

但更严重的问题是class_weight与类相关的权重，但在您的情况下，“薪水”是功能的名称。您不能为特征分配权重。起初我误解了你的意图，但现在我对你想要什么感到困惑。

class_weightis的形式{class_label: weight}，如果你真的想class_weight在你的情况下设置，class_label应该是 0.0、1.0 等值，语法如下：

'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]

如果一个类的权重很大，则分类器更有可能预测数据属于该类。一种典型的使用情况class_weight是数据不平衡时。

这是一个示例，尽管分类器是 SVM。

更新：

完整的parameters_to_tune应该是这样的：

parameters_to_tune = {'min_samples_split': [2, 4, 6, 10, 15, 25],
                      'min_samples_leaf': [1, 2, 4, 10],
                      'max_depth': [None, 4, 10, 15],
                      'splitter' : ('best', 'random'),
                      'max_features':[None, 2, 4, 6, 8, 10, 12, 14],
                      'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]}

score 0 · Accepted Answer

下面的链接是关于不同 class_weight 值的使用。只需Ctrl+F“class_weight”到相关部分。它GridSearchCV用于为不同的优化目标找到最佳的 class_weight。

使用不同的评估指标优化分类器

python - Sklearn GridSearchCV，class_weight 因未知原因无法正常工作:(

2 回答 2

更新：

Related

Reference