我对 spark 和 scala 都比较陌生。我试图在 spark 上使用 scala 实现协同过滤。下面是代码
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.Rating
val data = sc.textFile("/user/amohammed/CB/input-cb.txt")
val distinctUsers = data.map(x => x.split(",")(0)).distinct().map(x => x.toInt)
val distinctKeywords = data.map(x => x.split(",")(1)).distinct().map(x => x.toInt)
val ratings = data.map(_.split(',') match {
case Array(user, item, rate) => Rating(user.toInt,item.toInt, rate.toDouble)
})
val model = ALS.train(ratings, 1, 20, 0.01)
val keywords = distinctKeywords collect
distinctUsers.map(x => {(x, keywords.map(y => model.predict(x,y)))}).collect()
它在最后一行抛出一个scala.MatchError: null org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:571)如果我将 distinctUsers rdd 收集到一个数组中并执行相同的代码,Thw 代码可以正常工作:
val users = distinctUsers collect
users.map(x => {(x, keywords.map(y => model.predict(x, y)))})
处理 RDD 时我在哪里弄错了?
Spark 版本:1.0.0 Scala 版本:2.10.4