我正在阅读基于给定权重向量对数组进行重新采样的 weka 实现。通读代码后,我不确定此实现的算法是什么。另外,我对这两行代码的用法很困惑:
Utils.normalize(probabilities, sumProbs / sumOfWeights);
和
// Make sure that rounding errors don't mess things up
probabilities[numInstances() - 1] = sumOfWeights;
我不知道它们是用来做什么的。以下是从 Weka 复制的代码
Instances weka::core::Instances::resampleWithWeights(Random random,double[] weights )
{
if (weights.length != numInstances()) {
throw new IllegalArgumentException("weights.length != numInstances.");
}
Instances newData = new Instances(this, numInstances());
if (numInstances() == 0) {
return newData;
}
double[] probabilities = new double[numInstances()];
double sumProbs = 0, sumOfWeights = Utils.sum(weights);
for (int i = 0; i < numInstances(); i++) {
sumProbs += random.nextDouble();
probabilities[i] = sumProbs;
}
Utils.normalize(probabilities, sumProbs / sumOfWeights);
// Make sure that rounding errors don't mess things up
probabilities[numInstances() - 1] = sumOfWeights;
int k = 0; int l = 0;
sumProbs = 0;
while ((k < numInstances() && (l < numInstances()))) {
if (weights[l] < 0) {
throw new IllegalArgumentException("Weights have to be positive.");
}
sumProbs += weights[l];
while ((k < numInstances()) &&
(probabilities[k] <= sumProbs)) {
newData.add(instance(l));
newData.instance(k).setWeight(1);
k++;
}
l++;
}
return newData;
}