我已经实现了 AdaBoost 增强算法的一个版本,我使用决策树桩作为弱学习器。但是我经常发现,在训练 AdaBoost 算法之后,会创建一系列弱学习器,这样这个系列就会在整个集合中重复出现。例如,经过训练,弱学习者的集合看起来像这样A,B,C,D,E,D,E,D,E,D,E,F,E,D,E,D,E
。
我相信在每次分配一个新的弱学习器后,我都会正确更新数据的权重。这里我对每个数据点进行分类,然后设置这个数据点的权重。
// After we have chosen the weak learner which reduces the weighted sum error by the most, we need to update the weights of each data point.
double sumWeights = 0.0f; // This is our normalisation value so we can normalise the weights after we have finished updating them
foreach (DataPoint dataP in trainData) {
int y = dataP.getY(); // Where Y is the desired output
Object[] x = dataP.getX();
// Classify the data input using the weak learner. Then check to see if this classification is correct/incorrect and adjust the weights accordingly.
int classified = newLearner.classify(x);
dataP.updateWeight(y, finalLearners[algorithmIt].getAlpha(), classified);
sumWeights += dataP.getWeight();
}
这是我在 WeakLearner 类中的分类方法
// Method in the WeakLearner class
public int classify(Object[] xs) {
if (xs[splitFeature].Equals(splitValue))
return 1;
else return -1;
}
然后我有一个更新数据点权重的方法
public void updateWeight(int y, double alpha, int classified) {
weight = (weight * (Math.Pow(e, (-y * alpha * classified))));
}
而且我不确定为什么会发生这种情况,是否有任何共同因素导致通常会选择相同的弱学习者?