mahout - Mahout 中基于项目的推荐器的性能问题

Question

我正在尝试在 mahout 中使用基于项目的推荐器。它包含 250 万用户，项目交互，没有偏好值。大约有 100 个项目和 10 万用户。推荐大约需要 10 秒。而对于相同的数据，当我使用基于用户的推荐器时，它只需要不到一秒钟的时间。

ItemSimilarity sim = new TanimotoCoefficientSimilarity(dm); 
CandidateItemsStrategy cis = new SamplingCandidateItemsStrategy(10,10,10,dm.getNumUsers(),dm.getNumItems());
MostSimilarItemsCandidateItemsStrategy mis = new SamplingCandidateItemsStrategy(10,10,10,dm.getNumUsers(),dm.getNumItems());
Recommender ur = new GenericBooleanPrefItemBasedRecommender(dm,sim,cis,mis);

我阅读了@Sean 的答案之一，他建议将上述参数用于 SamplingCandidateItemsStrategy。但我不确定它到底做了什么。

编辑：2.5 M 是总用户-项目关联，有 100K 用户，项目总数为 100。

score 1 · Accepted Answer

在众多原因中，选择基于项目的推荐器的主要原因是：if the number of items is relatively low compared to the number of users, the performance advantage could be significant. 这也反过来。If the number of users is relatively low compared to the number of items, choosing user-based recommendation will result in performance advantage.

从您的问题中，我真的没有得到您的数据集中的项目数量以及用户数量。一旦你提到2.5M然后100K？无论如何，如果基于用户的推荐对您来说更快，您应该选择这种方法。

除非，如果您的项目相似性更加固定（预计不会发生根本性或频繁变化），那么它们是预计算的更好候选者。您可以进行预计算并使用项目之间的预计算相似性。

另外，由于您没有偏好值，并且如果您想使用基于项目的相似度，您可以考虑根据项目的某些特征，用一些纯项目相似度来丰富相似度函数。（这只是一个想法）。

mahout - Mahout 中基于项目的推荐器的性能问题

1 回答 1

Related

Reference