apache-spark - Pyspark ML：如何使用 CrossValidator() 获取子模型值

Question

我想使用end库获得cross-validation的（内部）训练准确性：PySparkML

lr = LogisticRegression()
param_grid = (ParamGridBuilder()
                     .addGrid(lr.regParam, [0.01, 0.5])
                     .addGrid(lr.maxIter, [5, 10])
                     .addGrid(lr.elasticNetParam, [0.01, 0.1])
                     .build())
evaluator = MulticlassClassificationEvaluator(predictionCol='prediction')
cv = CrossValidator(estimator=lr, 
                    estimatorParamMaps=param_grid, 
                    evaluator=evaluator, 
                    numFolds=5)
model_cv = cv.fit(train)
predictions_lr = model_cv.transform(validation)
predictions = evaluator.evaluate(predictions_lr)

为了获取每个c.v.文件夹的准确度指标，我尝试过：

print(model_cv.subModels)

但是这个方法的结果是空的（None）。

我怎么能得到accuracy每个文件夹的？

score 1 · Accepted Answer

我知道这是旧的，但以防万一有人在寻找火花以在交叉验证过程中保存非最佳模型，需要在创建CrossValidator. 只需将值设置为 True（默认为 False）。

IE

CrossValidator(estimator=lr, 
               estimatorParamMaps=param_grid, 
               evaluator=evaluator, 
               numFolds=5,
               collectSubModels=True)

apache-spark - Pyspark ML：如何使用 CrossValidator() 获取子模型值

1 回答 1

Related

Reference