10

我有一个 Rails 应用程序,它使用 postgresql 作为数据库,按位置对不同类型的用户进行排序,然后按他们在网站上的各种活动收到的信誉点。这是一个示例查询

 @lawyersbylocation = User.lawyers_by_province(province).sort_by{ |u| -u.total_votes }

该查询调用 User.rb 模型上的作用域律师_by_province:

 scope :lawyers_by_province, lambda {|province|
  joins(:contact).
  where( contacts: {province_id: province},
         users: {lawyer: true})

  }

然后,仍然在 User.rb 模型上,它计算他们拥有的声誉点。

 def total_votes
    answerkarma = AnswerVote.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    contributionkarma = Contribution.where(user_id: self.id).sum('value')
    bestanswer = BestAnswer.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    answerkarma + contributionkarma + bestanswer
 end

有人告诉我,如果站点达到一定数量的用户,那么它会变得非常慢,因为它是在 Ruby 中而不是在数据库级别进行排序的。我知道评论指的是total_votes 方法,但我不确定lawers_by_province 是在数据库级别还是在ruby 中发生,因为它使用Rails 助手来查询数据库。对我来说似乎是两者的混合,但我不确定这对效率的影响。

你能告诉我如何写这个,以便查询在数据库级别发生,因此以更有效的方式不会破坏我的网站吗?

更新以下是 total_votes 方法中模型的三种方案。

 create_table "answer_votes", force: true do |t|
    t.integer  "answer_id"
    t.integer  "user_id"
    t.integer  "value"
    t.boolean  "lawyervote"
    t.boolean  "studentvote"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  add_index "answer_votes", ["answer_id"], name: "index_answer_votes_on_answer_id", using: :btree
  add_index "answer_votes", ["lawyervote"], name: "index_answer_votes_on_lawyervote", using: :btree
  add_index "answer_votes", ["studentvote"], name: "index_answer_votes_on_studentvote", using: :btree
  add_index "answer_votes", ["user_id"], name: "index_answer_votes_on_user_id", using: :btree



create_table "best_answers", force: true do |t|
    t.integer  "answer_id"
    t.integer  "user_id"
    t.integer  "value"
    t.datetime "created_at"
    t.datetime "updated_at"
    t.integer  "question_id"
  end

  add_index "best_answers", ["answer_id"], name: "index_best_answers_on_answer_id", using: :btree
  add_index "best_answers", ["user_id"], name: "index_best_answers_on_user_id", using: :btree



create_table "contributions", force: true do |t|
    t.integer  "user_id"
    t.integer  "answer_id"
    t.integer  "value"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  add_index "contributions", ["answer_id"], name: "index_contributions_on_answer_id", using: :btree
  add_index "contributions", ["user_id"], name: "index_contributions_on_user_id", using: :btree

此外,这里是包含在 user.rb 模型上的律师_by_province 范围中使用的province_id 的联系方案

  create_table "contacts", force: true do |t|
    t.string   "firm"
    t.string   "address"
    t.integer  "province_id"
    t.string   "city"
    t.string   "postalcode"
    t.string   "mobile"
    t.string   "office"
    t.integer  "user_id"
    t.string   "website"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

更新试图应用@Shawn的答案,我把这个方法放在user.rb模型中

 def self.total_vote_sql
    "( " +
    [
     AnswerVote.joins(:answer).select("user_id, value"),
     Contribution.select("user_id, value"),
     BestAnswer.joins(:answer).select("user_id, value")
    ].map(&:to_sql) * " UNION ALL " + 
    ") as total_votes "
  end

然后在控制器中,我做了这个(放在User前面total_vote_sql

@lawyersbyprovince = User.select("users.*, sum(total_votes.value) as total_votes").joins("left outer join #{User.total_vote_sql} on users.id = total_votes.user_id").
                            order("total_votes desc").lawyers_by_province(province)

它给了我这个错误

ActiveRecord::StatementInvalid in LawyerProfilesController#index

PG::Error: ERROR: column reference "user_id" is ambiguous LINE 1: ..."user_id" = "users"."id" left outer join ( SELECT user_id, v... ^ : SELECT users.*, sum(total_votes.value) as total_votes FROM "users" INNER JOIN "contacts" ON "contacts"."user_id" = "users"."id" left outer join ( SELECT user_id, value FROM "answer_votes" INNER JOIN "answers" ON "answers"."id" = "answer_votes"."answer_id" UNION ALL SELECT user_id, value FROM "contributions" UNION ALL SELECT user_id, value FROM "best_answers" INNER JOIN "answers" ON "answers"."id" = "best_answers"."answer_id") as total_votes on users.id = total_votes.user_id WHERE "contacts"."province_id" = 6 AND "users"."lawyer" = 't' ORDER BY total_votes desc

更新对 Shawn 的帖子应用编辑后,错误消息现在是这样的:

PG::Error: ERROR: column reference "user_id" is ambiguous LINE 1: ..."user_id" = "users"."id" left outer join ( SELECT user_id as... ^ : SELECT users.*, sum(total_votes.value) as total_votes FROM "users" INNER JOIN "contacts" ON "contacts"."user_id" = "users"."id" left outer join ( SELECT user_id as tv_user_id, value FROM "answer_votes" INNER JOIN "answers" ON "answers"."id" = "answer_votes"."answer_id" UNION ALL SELECT user_id as tv_user_id, value FROM "contributions" UNION ALL SELECT user_id as tv_user_id, value FROM "best_answers" INNER JOIN "answers" ON "answers"."id" = "best_answers"."answer_id") as total_votes on users.id = total_votes.tv_user_id WHERE "contacts"."province_id" = 6 AND "users"."lawyer" = 't' ORDER BY total_votes desc
4

5 回答 5

8

警告:我对 Rails 很陌生,但这是我保持理智的技术,同时出于性能原因需要不断直接访问数据库,我需要一直这样做,因为您只能拥有以下两个

  1. 批量数据处理
  2. 纯 Rails 技术
  3. 很棒的表演

无论如何,一旦你需要进入这些混合方法,它们是部分红宝石部分 SQL,我觉得你不妨全力以赴,选择纯 SQL 解决方案。

  1. 它更容易调试,因为您更有效地隔离了两个代码层。
  2. 优化 SQL 更容易,因为如果这不是您的强项,您更有机会让专门的 SQL 人员为您查看它。

我认为您在这里寻找的 SQL 大致如下:

with cte_scoring as (
  select
    users.id,
    (select Coalesce(sum(value),0) from answer_votes  where answer_votes.user_id  = users.id) +
    (select Coalesce(sum(value),0) from best_answers  where best_answers.user_id  = users.id) +
    (select Coalesce(sum(value),0) from contributions where contributions.user_id = users.id) total_score
  from
    users join
    contacts on (contacts.user_id = users.id)
  where
    users.lawyer         = 'true'          and
    contacts.province_id = #{province.id})
select   id,
         total_score
from     cte_scoring
order by total_score desc
limit    #{limit_number}

这应该为您提供最佳性能 - SELECT 中的子查询并不理想,但该技术确实对您正在检查分数的 user_id 应用过滤。

集成到 Rails:如果将 sql_string 定义为 SQL 代码:

scoring = ActiveRecord::Base.connection.execute sql_string

...然后你得到一个哈希数组,你可以像这样使用:

scoring.each do |lawyer_score|
  lawyer = User.find(lawyer_score["id"])
  score  = lawyer_score["total_score"]
  ...
end
于 2013-05-23T16:46:03.720 回答
2

您真的要每次都动态计算用户的声誉吗?正确的方法是预先计算用户的声誉。在 Rails 中,你可以这样做:

# app/models/reputation_change_observer.rb
class ReputationChangeObserver < ActiveRecord::Observer
  observe :answer, :contribution # observe things linked to a users reputation

  def after_update(record)
    record.user.update_reputation
  end
end

# app/models/user.rb
class User
  # Add a column called "reputation"

  def update_reputation
    answerkarma = AnswerVote.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    contributionkarma = Contribution.where(user_id: self.id).sum('value')
    bestanswer = BestAnswer.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    total_votes = contributionkarma + bestanswer

    # Save the updated reputation in the "reputation" field
    self.update_attribute :reputation, total_votes
  end
end

这样,声誉将只计算一次,并将存储在数据库中。然后,您只需使用普通 SQL: 进行排序User.order_by(:reputation)

如果您的网站仍在大量增长,那么您有两个选择:

  1. 等待 10-15 分钟,然后重新计算同一用户的声誉(使用单独的列reputation_timestamp来跟踪上次计算用户声誉的时间)

  2. 每当用户发布答案/贡献时,只需在用户中设置一个名为reputation_recalc => true. 稍后每 10-15 分钟运行一次后台作业,查询所有拥有的用户reputation_recalc: true并使用相同的方法计算其声誉update_reputation

编辑:代码中的小注释,以及次要格式,用户类的注释

于 2013-05-24T03:48:25.833 回答
1

一种可能对您有效的不同方法是通过三个评分模型的回调将总金额保持在用户级别:- answer_value、best_answer_value 和contribution_value(不可为空且默认值为零)

尽管这是单个用户记录的潜在锁定问题,但投票过程可能足够快以至于不会引起注意。

通过为三个分数维护单独的列并创建基于表达式的(可能是部分的)索引,您将获得针对 Top-n 的非常高性能的查询:

create index ..
on     users (
         id,
         answer_value + best_answer_value + contribution_value)
where  lawyer = 'true'
于 2013-05-23T17:08:32.503 回答
1

将您的总投票查询合并,使其成为子查询,将其加入您的用户查询。这也为您提供了 total_votes 属性。

def self.total_vote_sql
    "(select user_id, sum(value) as total_votes from ( " +
    [
     AnswerVote.joins(:answer).select("answers.user_id, value"),
     Contribution.select("user_id, value"),
     BestAnswer.joins(:answer).select("answers.user_id, value")
    ].map(&:to_sql) * " UNION ALL " + 
    ") as total_votes group by user_id) as tv "
end

User.select("users.*, tv.total_votes").
joins("left outer join #{User.total_vote_sql} on users.id = tv.user_id").
order("total_votes desc").lawyers_by_province(province)

(注意,我在 mysql 上对此进行了测试,但 postgres 应该类似,您可能还需要分组。)您可能还想对此进行基准测试,而不是在子查询中将连接添加到用户。

total_vote_sql 方法只是从每个表中获取 value 和 user_id,在每个表上生成 sql,然后将它们与 UNION 连接起来。


我编辑了帖子以解决不明确的列名错误。它与律师_by_province 中的联接产生了冲突。


我还进行了编辑以解决 answer_votes 和 answers 以及 best_answers 和 answers 之间的模棱两可的 user_id。


我在连接中添加了一个外部子查询,以执行求和所需的 group_by。

于 2013-05-17T17:54:50.293 回答
0

对于排序和过滤,您可以使用gem 'wice_grid'它非常易于使用和实现... wice grid

于 2013-05-29T11:45:12.507 回答