mongodb - 使用 map reduce 的用户到用户相似度

Question

我的收藏包含：

{ user_id : 1, product_id : 1 },
{ user_id : 1, product_id : 2 },
{ user_id : 1, product_id : 3 },
{ user_id : 2, product_id : 2 },
{ user_id : 2, product_id : 3 },
{ user_id : 3, product_id : 2 },

我的收藏跟踪用户查看的产品，其中user_id是用户的 ID，product_id是产品的 ID。
我想计算两个用户之间的相似性，例如他们都查看过的产品数量。
例如从上面的集合中，用户之间的相似性将是

{ user_id1 : 1, user_id2 : 2, similarity : 2 },
{ user_id1 : 1, user_id2 : 3, similarity : 1 },
{ user_id1 : 2, user_id2 : 3, similarity : 1 },

已编辑

我在没有 map-reduce 的情况下完成了

def self.build_similarity_weight
  users_id = ProductView.all.distinct(:user_id).to_a
  users_id.each do |user_id|
    this_user_products = ProductView.all.where(user_id: user_id).distinct(:product_id).to_a

    other_users = users_id.map { |e| e } 
    other_users.delete_if { |x| x == user_id }

    other_users.each do |other_uid|
      other_user_products = ProductView.all.where(user_id: other_uid).distinct(:product_id).to_a
      user_sim = (other_user_products & this_user_products).length
      usw = UserSimilarityWeight.new(user_id1: user_id, user_id2: other_uid, weight: user_sim)
      usw.save
    end
  end
end

问题是我的代码效率不高，O(n ² )，其中n是用户数。
如何使用 map-reduce 使我的代码更高效？

问候，

score 2 · Accepted Answer

首先，您执行 2 个 mapreduce。

- map：省略 product_id 作为 key 和 user_id 作为 value
- 减少：在循环内循环迭代值列表（每个产品的用户 ID 列表）并省略作为用户 ID 的键对（其中最小的用户 ID 是第一个）和值 1
（处理第一张地图减少的结果）
- map：只需将用户对作为键传递，将 1 的值作为值传递
- 减少：对每对的值求和。

其次，你不能比 O(n2) 更有效率，因为你的结果是 O(n2) 的顺序。意思是，即使以某种神奇的方式，你会得到对和相似度，你仍然需要写 n^2 对。

mongodb - 使用 map reduce 的用户到用户相似度

已编辑

1 回答 1

Related

Reference