5

总的来说,我对 Apache Spark 和大数据非常陌生。我正在使用 ALS 方法来创建基于用户、项目和评级矩阵的评级预测。令人困惑的部分是,当我运行脚本来计算预测时,每次结果都不同,而输入或请求的预测没有改变。这是预期的行为,还是结果应该相同?以下是 Python 代码供参考。

from pyspark import SparkContext
from pyspark.mllib.recommendation import ALS

sc = SparkContext("local", "CF")

# get ratings from text
def parseRating(line):
  fields = line.split(',')
  return (int(fields[0]), int(fields[1]), float(fields[2]))

# define input and output files
ratingsFile = 's3n://weburito/data/weburito_ratings.dat'
unratedFile = 's3n://weburito/data/weburito_unrated.dat'
predictionsFile = '/root/weburito/data/weburito_predictions.dat'

# read training set
training = sc.textFile(ratingsFile).map(parseRating).cache()

# get unknown ratings set
predictions = sc.textFile(unratedFile).map(parseRating)

# define model
model = ALS.train(training, rank = 5, iterations = 20)

# generate predictions
predictions = model.predictAll(predictions.map(lambda x: (x[0], x[1]))).collect()
4

1 回答 1

3

This is expected behaviour. The factor matrices in ALS are initialized randomly (well actually one of them is, and the other is solved based on that initialization in the first step).

So different runs will give slightly different results.

于 2015-02-20T07:25:11.097 回答