apache-spark - Spark ML 2.0 - 使用类似于 spark.mllib 的 spark.ml 库提供的评估指标

Question

我们正在运行一个创建 3 个分类器的 RandomForest 模型，我们希望计算 AUC 以用于评估我们的模型，而不是使用准确性

如果我们使用 spark.ml，会有一种方法吗？目前我们调用 MulticlassClassificationEvaluator 并使用度量精度。在清单中，它没有 auc 作为它的一部分，而只有以下内容：指标：

   * param for metric name in evaluation (supports `"f1"` (default), `"weightedPrecision"`,* `"weightedRecall"`, `"accuracy"`)

想知道是否有关于如何计算火花 AUC 的示例？

我们正在运行 Spark 2.0，这是我们正在使用准确度指标进行评估的当前设置

max_depth = model_params['max_depth']
num_trees = model_params['num_trees']

# Train a RandomForest model.

rf = RandomForestClassifier(labelCol="label", featuresCol="features", impurity = "gini",
                        featureSubsetStrategy="all", numTrees = num_trees, maxDepth = max_depth)



# Train model. This model fit is used for scoring future packages later.
model_fit = rf.fit(training_data)

# Make predictions.
transformed = model_fit.transform(test_data)

# Calculate and show the confusion matrix on test data if indicated
if model_params['calc_matrix'] is True:
    # Select (prediction, true label) and compute test error
    evaluator = MulticlassClassificationEvaluator(labelCol="label", 
                                predictionCol="prediction", metricName="accuracy")
    accuracy = evaluator.evaluate(transformed)
    print("RF Overall Accuracy = {}, numTrees = {}, maxDepth = {}".
          format(accuracy, num_trees, max_depth))

score 1 · Accepted Answer

曲线下面积 (AUC) 仅对二元分类器有意义，但您使用的是 MulticlassClassificationEvaluator（这意味着输出类的数量 > 2）

检查BinaryClassificationEvaluator

但是，如果您想构建多类分类器，则需要多类准确度

apache-spark - Spark ML 2.0 - 使用类似于 spark.mllib 的 spark.ml 库提供的评估指标

1 回答 1

Related

Reference