1

我目前正在开发一个根据回答的问题匹配用户的应用程序。我在正常的 RoR 和 ActiveRecord 查询中实现了我的算法,但是使用它的速度很慢。将一位用户与 100 位其他用户匹配需要

Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)

在我的本地机器上。但仍然......我现在想在原始 SQL 中实现这一点,以获得更多的性能。但是我真的很难在 SQL 查询和诸如此类的东西加上计算等内容中了解 SQL 查询。我的头快要爆炸了,我什至不知道从哪里开始。

这是我的算法:

def match(user)
  @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
  @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100

  if self.common_questions(user) == []
    0.to_f
  else
    match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
    if match <= 0
      0.to_f
    else
      match
    end
  end
end

def possible_score(user)
  i = 0
  self.user_questions.select("question_id, importance").find_each do |n|
    if user.user_questions.select(:id).find_by_question_id(n.question_id)
      i += Importance.find_by_id(n.importance).value
    end
  end
  return i
end

def actual_score(user)
  i = 0
  self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
    @user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
    unless @user_answer == nil
      if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id)
        i += Importance.find_by_id(n.importance).value
      end
    end
  end
  return i
end

所以基本上用户回答问题,选择他接受的答案以及这个问题对他有多重要。然后该算法检查 2 个用户的共同问题,如果用户 1 给出了用户 2 接受的答案,如果是,则添加用户 2 对每个问题给出的重要性,这构成了用户 1 的得分。对于 user2 也是相反的。除以可能的分数给出百分比,两个应用于几何平均值的百分比给出了两个用户的总匹配百分比。我知道相当复杂。告诉我是否解释得不够好。我只是希望我可以用原始 SQL 来表达这一点。性能就是这一切。

这是我的数据库表:

CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)

CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);

CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);

我猜 SQL 查询的顶部必须看起来像这样?

SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM 

但由于我不是 SQL 大师,只能做平常的事情,我的脑袋快要爆炸了。我希望有人能帮我解决这个问题。或者至少以某种方式提高我的表现!非常感谢!

编辑:

因此,根据向导的回答,我设法为“possible_score”获得了一个不错的 SQL 语句

SELECT SUM(value) AS sum_id 
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1

我试图用这个来获得“actual_score”,但它没有用。执行此操作时,我的数据库管理器崩溃了。

SELECT SUM(imp.value) AS sum_id 
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id =  uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1

编辑2

好吧,我是个白痴!我当然不能使用“as”作为别名。将其更改为 aa 并且有效!W00T!

4

2 回答 2

1

我知道您正在考虑迁移到 SQL 解决方案,但是可以对您的 Ruby 代码进行一些重大的性能改进,这可能会消除使用手动编码 SQL 的需要。在优化代码时,通常值得使用分析器来确保您确实知道哪些部分是问题所在。在您的示例中,我认为可以通过删除在每次迭代期间执行的迭代代码和数据库查询来进行一些重大改进!

此外,如果您使用的是最新版本的 ActiveRecord,您可以使用子选择生成查询,而无需编写任何 SQL 代码。当然,为数据库创建适当的索引很重要。

根据我从您的代码中推断出的内容,我对您的模型和关系做出了很多假设。如果我错了,请告诉我,我会尝试做出相应的调整。

def match(user)    
  if self.common_questions(user) == []
    0.to_f
  else
    # Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed.
    @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
    @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
    match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
    if match <= 0
      0.to_f
    else
      match
    end
  end
end

def possible_score(user)
  # If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance.
  #   I.e. UserQuestion belongs_to :importance, and Importance has_many :user_questions.
  # I'm assuming that user_questions represents join models between users and questions.  
  #   I.e. User has_many :user_questions, and User has_many :questions, :through => :user_questions.  
  #        Question has_many :user_questions, and Question has_many :users, :through => :user_questions
  # From your code this seems like the logical setup.  Let me know if my assumption is wrong.

  self.user_questions.
    joins(:importance).                                             # Requires the relation between UserQuestion and Importance I described above
    where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord
    sum(:value)                                                     # I'm also assuming that the importances table has a `value` column.
end

def actual_score(user)
  user_questions.
    joins(:importance, :accepted_answers).  # It looks like accepted_answers indicates an answers table
    where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)).
    sum(:value)
end

UserQuestion 似乎是用户、问题、答案和重要性之间的超级连接模型。以下是与代码相关的模型关系(不包括您可以创建的 has_many :through 关系)。我想你可能已经有了这些:

# User
has_many :user_questions

# UserQuestion
belongs_to :user
belongs_to :question
belongs_to :importance, :foreign_key => :importance  # Maybe rename the column `importance` to `importance_id`
belongs_to :answer

# Question
has_many :user_questions

# Importance
has_many :user_questions

# Answer
has_many :user_questions
于 2012-10-18T07:18:02.390 回答
0

所以这是我的新匹配功能。我还不能将所有内容都放在一个查询中,因为 SQLite 不支持数学函数。但是,一旦我切换到 MySQL,我就会将所有内容都放在一个查询中。所有这一切已经给我带来了巨大的性能提升:

Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms)

将一位用户与其他 100 位用户进行匹配。相当不错!一旦我用 10k 假用户填充我的数据库,我将不得不看看它的性能有多好。对“Ogz 向导”的额外赞誉指出了我效率低下的代码!

编辑:

仅使用 1000 个用户进行了尝试,每个用户有 10 到 100 个用户问题,并且...

Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms)

......男孩做这需要很长时间!我将不得不想办法解决这个问题。

def match(user)
if self.common_questions(user) == []
  0.to_f
else
  @a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
      FROM (SELECT SUM(imp.value) AS actual_score 
      FROM user_questions AS uq1
      INNER JOIN importances imp ON imp.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
      INNER JOIN accepted_answers aa ON aa.user_question_id =  uq1.id AND aa.answer_id = uq2.answer_id
      WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score 
      FROM user_questions AS uq1
      INNER JOIN importances ON importances.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
      WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f
  @b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
      FROM (SELECT SUM(imp.value) AS actual_score 
      FROM user_questions AS uq1
      INNER JOIN importances imp ON imp.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
      INNER JOIN accepted_answers aa ON aa.user_question_id =  uq1.id AND aa.answer_id = uq2.answer_id
      WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score 
      FROM user_questions AS uq1
      INNER JOIN importances ON importances.id = uq1.importance
      INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
      WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f
  
  match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
  if match <= 0
    0.to_f
  else
    match
  end
end
end
于 2012-10-18T21:25:26.077 回答