0

我在 pyspark 中搜索 ALS 推荐系统的最佳参数,它一直显示错误消息,例如“ SparkContext has been shutdown”/“Lost task”/“Bad mod”/“ BlockManagerMasterEndpoint: No more replicas available for...”

我尝试添加检查点,但运行几个小时后仍然失败。

我使用的代码:

from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

from pyspark.ml.evaluation import RegressionEvaluator

als = ALS(implicitPrefs=True, userCol="user", itemCol="brand_index",
          ratingCol="rating", coldStartStrategy="drop", nonnegative=True,checkpointInterval=3)

ALS.checkpointInterval =2

param_grid = ParamGridBuilder() \
            .addGrid(als.rank, [10, 50, 100, 150]) \
            .addGrid(als.regParam, [.01, .05, .1, .15]) \
            .build()

evaluator = RegressionEvaluator(
           metricName="rmse", 
           labelCol="rating", 
           predictionCol="prediction") 

print ("Num models to be tested: ", len(param_grid))

cv = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5)

sc.setCheckpointDir('checkpoint/')

model = cv.fit(training)

best_model = model.bestModel

print("**Best Model**")

print("  Rank:", best_model._java_obj.parent().getRank())

print("  MaxIter:", best_model._java_obj.parent().getMaxIter())

print("  RegParam:", best_model._java_obj.parent().getRegParam()) 

有人有类似的问题吗?任何建议,将不胜感激!

谢谢!

4

0 回答 0