我遵循了数据块的培训。它在 Azure 上运行,并使用以下配置构建:
构建.sbt
import AssemblyKeys._
assemblySettings
name := "movielens-als"
version := "0.1"
scalaVersion := "2.11.4"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0" % "provided"
它可以工作并提供建议。但是
1)控制台抱怨一些代码被弃用(见下面的日志中的左箭头)。我找不到有关此问题的一些信息。
2)此外,它多次警告我缺少参数:15/03/21 14:49:51 WARN recommendation.MatrixFactorizationModel: User factor does
not have a partitioner. Prediction on individual records could be slow.
.
安慰
C:\apps\dist\spark-1.2.0\bin>spark-submit --class MovieLensALS C:\user/app/movie
lens-als-assembly-0.1.jar /MySpark/user/data/ C:\user/personal/personalRatings.t
xt
15/03/21 14:49:19 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/21 14:49:19 INFO Remoting: Starting remoting
15/03/21 14:49:19 INFO Remoting: Remoting started; listening on addresses :[akka
.tcp://sparkDriver@headnode0.sparkcluster.a8.internal.cloudapp.net:60778]
15/03/21 14:49:23 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/21 14:49:24 INFO Configuration.deprecation: mapred.tip.id is deprecated. <======================= I
nstead, use mapreduce.task.id
15/03/21 14:49:24 INFO Configuration.deprecation: mapred.task.id is deprecated. <=======================
Instead, use mapreduce.task.attempt.id
15/03/21 14:49:24 INFO Configuration.deprecation: mapred.task.is.map is deprecat
ed. <======================= Instead, use mapreduce.task.ismap
15/03/21 14:49:24 INFO Configuration.deprecation: mapred.task.partition is depre
cated. Instead, use mapreduce.task.partition
15/03/21 14:49:24 INFO Configuration.deprecation: mapred.job.id is deprecated. I
nstead, use mapreduce.job.id
[Stage 0:> (0 + 2) / 2]
[Stage 0:=============================> (1 + 1) / 2]
15/03/21 14:49:24 INFO mapred.FileInputFormat: Total input paths to process : 1
[Stage 1:> (0 + 2) / 2]
[Stage 2:> (0 + 2) / 2]
[Stage 2:=============================> (1 + 1) / 2]
[Stage 3:> (0 + 2) / 2]
[Stage 4:> (0 + 2) / 2]
Got 1000209 ratings from 6040 users on 3706 movies.
[Stage 6:===================> (1 + 2) / 3]
[Stage 7:> (0 + 4) / 4]
[Stage 8:> (0 + 0) / 2]
[Stage 8:> (0 + 2) / 2]
[Stage 10:> (0 + 2) / 2]
Training: 602252, validation: 198919, test: 199049
[Stage 12:> (0 + 4) / 4]
[Stage 12:===========================================> (3 + 1) / 4]
[Stage 34:> (0 + 4) / 4]
[Stage 13:> (0 + 4) / 4]
[Stage 16:> (0 + 4) / 4]
[Stage 17:> (0 + 4) / 4]
15/03/21 14:49:51 WARN recommendation.MatrixFactorizationModel: User factor does
not have a partitioner. Prediction on individual records could be slow.
15/03/21 14:49:51 WARN recommendation.MatrixFactorizationModel: Product factor d
oes not have a partitioner. Prediction on individual records could be slow.
[Stage 140:> (0 + 0) / 4]
[Stage 167:> (0 + 4) / 4]
[Stage 167:============================> (2 + 2) / 4]
[Stage 165:> (0 + 4) / 4]
[Stage 166:> (0 + 4) / 4]
[Stage 166:==========================================> (3 + 1) / 4]
[Stage 168:> (0 + 4) / 4]
RMSE (validation) = 0.8694473524689862 for the model trained with rank = 8, lamb
da = 0.1, and numIter = 10.