0

我正在尝试使用PearsonCorrelationSimilarity. 我DataModel从包含用户 ID、项目 ID、首选项、时间戳(按此顺序)的文件中加载我的代码如下所示:

DataModel model = new FileDataModel(new File("FILE_NAME"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
                    @Override
                    public Recommender buildRecommender(DataModel model) throws TasteException {
                        ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
                        Optimizer optimizer = new ConjugateGradientOptimizer();
                        return new KnnItemBasedRecommender(model, similarity, optimizer, N);
                    }

                };
score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

当我运行它时,我得到了很多

INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data:

这是否与我的DataModel或与评估者有关。我都试过了RMSRecommenderEvaluatorAverageAbsoluteDifferenceRecommenderEvaluator但我得到了相同的信息通知。我也尝试使用RandomUtils.useTestSeed();. 当我使用UserSimilarity指标运行相同时,我没有这个问题。

我的问题是这会影响我的评估结果吗?

谢谢你。德拉甘

4

1 回答 1

1

Basically, you are seeing the Item exists in test data but not training data message because of the way evaluation happens. The data is split into 2, a training set and a test set. The recommender is trained on the training data and then results are validated against the test set. This partition into training and test is done randomly, so yes, some items might be in the training set and not in the test set, and viceversa. For more significant results you should run the test around 3 or more times and average the result.

Ideally you would not use RandomUtils.useTestSeed(); in production evaluation code, it's mostly for testing purposes given that is set the random seed to be the same every time you run your test, hence you get repeatability (good for testing the internal evaluator code)

Also, knn recommender is deprecated in Mahout 0.8 (recently released) and will be removed in 0.9

于 2013-08-12T09:15:42.697 回答