我在 pyspark 中搜索 ALS 推荐系统的最佳参数,它一直显示错误消息,例如“ SparkContext has been shutdown”/“Lost task”/“Bad mod”/“ BlockManagerMasterEndpoint: No more replicas available for...”。
我尝试添加检查点,但运行几个小时后仍然失败。
我使用的代码:
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import RegressionEvaluator
als = ALS(implicitPrefs=True, userCol="user", itemCol="brand_index",
ratingCol="rating", coldStartStrategy="drop", nonnegative=True,checkpointInterval=3)
ALS.checkpointInterval =2
param_grid = ParamGridBuilder() \
.addGrid(als.rank, [10, 50, 100, 150]) \
.addGrid(als.regParam, [.01, .05, .1, .15]) \
.build()
evaluator = RegressionEvaluator(
metricName="rmse",
labelCol="rating",
predictionCol="prediction")
print ("Num models to be tested: ", len(param_grid))
cv = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5)
sc.setCheckpointDir('checkpoint/')
model = cv.fit(training)
best_model = model.bestModel
print("**Best Model**")
print(" Rank:", best_model._java_obj.parent().getRank())
print(" MaxIter:", best_model._java_obj.parent().getMaxIter())
print(" RegParam:", best_model._java_obj.parent().getRegParam())
有人有类似的问题吗?任何建议,将不胜感激!
谢谢!