postgresql - 在 Postgresql 中为 order_by 添加一列以选择

Question

我有一个复杂的数据库查询，它根据房地产列表属性、学校的绩效统计数据以及每个列表与公共交通的距离来选择学校。用户创建一个Search对象，其中的一个方法find_schools有search.rb这个查询：

         School.where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
         :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
         self.cta_distance).where.not(performance_stats: {"#{sort_column.to_sym}" => 
         nil}).distinct.limit(30).order("performance_stats.#{sort_column} DESC")

School.rb
scope :narrow_schools_for_search, ->(search,prop_type,status,year) {joins(:listings).joins(:performance_stats)
     .where("listings.beds >= ?",search.beds).where("listings.price <= ?",search.max_price)
     .where("listings.price >= ?",search.min_price).where(listings: {prop_type: prop_type, status: status})
     .where(performance_stats: {year: year}).distinct}
  
  has_many :performance_stats, dependent: :destroy
  has_many :assignments, dependent: :destroy
  has_many :listings, through: :assignments

Listing.rb
has_many :assignments, dependent: :destroy
    has_many :schools, through: :assignments
    has_many :cta_listings, dependent: :destroy
    has_many :cta_stations, through: :cta_listings
    has_many :metra_listings, dependent: :destroy
    has_many :metra_stations, through: :metra_listings

PerformanceStat.rb
belongs_to :school

我需要按关联表 PerformanceStats 中的属性排序的学校，这是一个用户定义的属性sort_column。该查询在开发环境（sqlite3）中工作，但在登台应用程序（PG）上失败并出现以下错误：

ActiveRecord::StatementInvalid (PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

我需要添加一个选择语句，其中包含我对学校进行排序的列名。

对像这样的其他帖子的建议是执行以下操作：

Widget.select('"widgets".*, "widget_steps.name"')

所以，就我而言，我尝试了这个：

sort_for_select = "performance_stats.#{sort_column}"

    School.select('"schools".*, "#{sort_for_select"').where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
                 :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
                 self.cta_distance).where.not(performance_stats: {sort_column.to_sym => 
                 nil}).distinct.limit(30).order("performance_stats.#{sort_column} DESC")

但我的编辑表明我并没有真正逃到红宝石。无论如何我都试过了，果然，它失败了

 ActiveRecord::StatementInvalid (PG::UndefinedColumn: ERROR:  column "#{sort_for_select}" does not exist.

然后我尝试硬编码sort_column：

School.select('"schools".*, "performance_stats.grall_adjpicalc"').where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
                     :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
                     self.cta_distance).where.not(performance_stats: {grall_adjpicalc:
                     nil}).distinct.limit(30).order("performance_stats.grall_adjpicalc DESC")

这适用于开发环境，但如果在登台应用程序上失败并出现此错误：

ActiveRecord::StatementInvalid (PG::UndefinedColumn: ERROR:  column "performance_stats.grall_adjpicalc" does not exist

所以在这一点上我必须每次都部署来测试新的想法。我知道开发中的 PG 是理想的，但我花了整整一周的时间试图改变并且无法让它发挥作用。最终失去了一切，不得不从头开始重新播种。

我有3个问题：

我对 Select 语句做错了什么？
有没有另一种快速的方法来避免这个问题？我在想而不是 Distinct，也许我可以使用 uniq，转换为数组，然后对数组进行相应的排序。
如何将变量sort_column放入 select 语句中？

任何想法或建议都非常感谢！

score 0 · Accepted Answer

我最终放弃了distinctandorder支持mergeand uniq。工作代码看起来像这样，但它是一个对象数组而不是 ActiveRecord 关系：

School.where(id: school_ids).narrow_schools_for_search(self,prop_type,status,year).joins(listings: 
     :cta_listings).joins(:performance_stats).where("cta_listings.distance <= ?", 
     self.cta_distance).where.not(performance_stats: {"#{sort_column.to_sym}" => 
     nil}).merge(PerformanceStat.order("#{sort_column.to_sym}" => :desc)).uniq[0..29]

score 0 · Accepted Answer

更新：我原来的答案中的工作代码很慢，经常在 Heroku 上超时。我最终得到了一个使用 arel_tables 运行速度提高 3 倍的查询。工作代码看起来像这样（self是一个@search对象）。我不是专业的编码员，所以毫无疑问这可以执行得更快，我欢迎任何提高性能的建议。根据查询，它仍然需要 5-8 秒。但至少我不再超时了。

  school = School.arel_table
  pstat = PerformanceStat.arel_table
  schools = school.project(Arel.star).join(pstat).on(pstat[:school_id].eq(school[:id]).and(
          school[:area_id].in(self.area_ids).and(
              pstat[:year].eq(year)
            )
          )
        )
   query = schools.to_sql
   school_ids = School.find_by_sql(query).pluck(:id)

这给了我一系列可能的学校 ID，现在我需要在这些学校中找到与其他搜索参数匹配的列表，包括到公共交通的最大距离（如果选择）。学校和列表通过分配模型加入。

        listing = Listing.arel_table
        assignment = Assignment.arel_table
         if self.cta || self.metra
           cta_listing = CtaListing.arel_table
           metra_listing = MetraListing.arel_table
           
           listing_assign = listing.join(assignment).on(assignment[:listing_id].eq(listing[:id])).join(cta_listing, Arel::Nodes::OuterJoin).on(cta_listing[:listing_id].eq(listing[:id])).join(metra_listing, Arel::Nodes::OuterJoin).on(metra_listing[:listing_id].eq(listing[:id]))
           
           selected_listings = listing_assign.project(assignment[:school_id], listing[:id]).where(
           assignment[:school_id].in(school_ids).and(
           cta_listing[:distance].lteq(self.cta_distance).or(
           metra_listing[:distance].lteq(self.metra_distance))).and(
           listing[:prop_type].in(prop_type).and(
           listing[:status].in(status).and(
           listing[:beds].gteq(self.beds).and(
           listing[:active].eq(true).and(
           listing[:price].lteq(self.max_price).and(
           listing[:price].gteq(self.min_price))))))))
         else
           listing_assign = listing.join(assignment).on(assignment[:listing_id].eq(listing[:id]))

           selected_listings = listing_assign.project(assignment[:school_id], listing[:id]).where(
           assignment[:school_id].in(school_ids).and(
           listing[:prop_type].in(prop_type).and(
           listing[:status].in(status).and(
           listing[:beds].gteq(self.beds).and(
           listing[:active].eq(true).and(
           listing[:price].lteq(self.max_price).and(
           listing[:price].gteq(self.min_price))))))))

         end

       q = selected_listings.to_sql
       listings = Listing.find_by_sql(q)

现在我有了与搜索匹配的所有列表的 AR 关系。在此之前，我不能限制在 30 所学校，因为我不确定一所学校是否会有任何符合要求的列表。每所学校必须至少有一个列表。我需要返回前 30 所学校及其列表，所以首先我创建了一个数组数组，其中每个列表 id 和相应的 school_id。

       listings_array = listings.map{|x| [x.school_id,x.id]}

然后我将这个数组数组转换为按 school_id 分组的散列：

       listings_hash = listings_array.group_by{|school_id| school_id.shift}.transform_values do |values|
   values.flatten.uniq
 end

现在我可以按选定的列对这些学校进行排序，然后选择前 30 名。

 if sort_column.nil?
   sort_column = "grall_adjpicalc"
 end

 schools = School.where(id: listings_hash.keys).includes(:performance_stats).where(performance_stats: 
 {year: year}).order("performance_stats.#{sort_column} desc").limit(30)

现在我有了我们的前 30 所学校，并且可以返回一个数组，其中包含每所学校及其对应的 listing_ids。

 schools_array = schools.map{|school| [school,listings_hash[school.id]]}
 return schools_array

这比之前的答案要长得多，但至少快 3 倍。你能找到一种方法来显着加快速度吗？

postgresql - 在 Postgresql 中为 order_by 添加一列以选择

2 回答 2

Related

Reference