我正在使用 OpenCV 3.1.0 使用 EM 将高斯混合模型拟合到两类数据。样本已标记,因此我在训练期间使用EM::trainE提供类均值和协方差。当我检查预测的标签时,它们似乎很适合数据,但恰恰相反(来自第 1 类的样本几乎总是被预测为第 0 类,反之亦然)。以下是模型的训练方式:
// Run EM
Mat predicted_labels(samples.rows, 1, CV_64F);
Mat means0(EM_CLASS_COUNT, SAMPLE_DIMENSIONS, CV_64F);
const int sizes[]{ EM_CLASS_COUNT, SAMPLE_DIMENSIONS, SAMPLE_DIMENSIONS };
Mat covar0(3, sizes, CV_64F);
for (int i = 0; i < EM_CLASS_COUNT; ++i) {
calcCovarMatrix(class_samples[i], Mat(SAMPLE_DIMENSIONS, SAMPLE_DIMENSIONS, samples.type(), covar0.row(i).data), means0.row(i), COVAR_NORMAL | COVAR_ROWS);
}
Ptr<EM> model = EM::create();
model->setClustersNumber(EM_CLASS_COUNT);
model->trainE(samples, means0, covar0, noArray(), noArray(), predicted_labels);
// Print results
for (int i = 0; i < csv_data.size(); ++i) {
printf("(%f, %f, %f): %d -> %d\n", samples.at<double>(i, 0), samples.at<double>(i, 1), samples.at<double>(i, 2), sample_labels.at<int>(i), predicted_labels.at<int>(i));
}
Mat error = (sample_labels != predicted_labels) / 255;
printf("Error rate: %f\n", norm(error, NORM_L1) / error.rows);
Mat means = model->getMeans();
printf("Sample means: 0:(%f, %f, %f), 1:(%f, %f, %f)\n", means0.at<double>(0, 0), means0.at<double>(0, 1), means0.at<double>(0, 2), means0.at<double>(1, 0), means0.at<double>(1, 1), means0.at<double>(1, 2));
printf("Calculated means: 0:(%f, %f, %f), 1:(%f, %f, %f)\n", means.at<double>(0, 0), means.at<double>(0, 1), means.at<double>(0, 2), means.at<double>(1, 0), means.at<double>(1, 1), means.at<double>(1, 2));
检查打印到控制台的数据,每个类别的计算平均值最接近对面类别的样本平均值。
Sample means: 0:(184.912913, 192.435435, 185.291291), 1:(149.543210, 150.604938, 129.833333)
Calculated means: 0:(147.953284, 153.951035, 139.721160), 1:(209.889542, 214.448519, 206.625586)
这是样本数据和训练模型的可视化,显示了交换的分类。红色是 0 类,蓝色是 1 类,协方差检索有问题,所以轮廓是圆形而不是椭圆:
有没有办法确保每个高斯最终优化它创建的样本,或者是否有标准方法来识别哪个类标签属于每个高斯?