5

我有一个按特定顺序排序的模型。我的目标是从模型中找到一条记录,其中所有先前记录的特定列的总和等于某个数字。下面的例子得到了我需要的东西,但它很慢,尤其是在一张相当大的桌子上。有没有更快的方法来解决所有先前产品点的总和 = 100000 的 product.id?

 total_points = 0
 find_point_level = 100000
 @products = Product.order("id").all
 @products.each do |product|
    total_points = product.points + total_points
    @find_product = product.id
    break if total_points >= find_point_level
 end

更新

以下是一些解决方案的一些时间。这将通过大约 60,000 条记录。时间用于 ActiveRecord。

原始示例(上):
2685.0ms
1238.8ms
1428.0ms

使用 find_each 的原始示例:
799.6ms
799.4ms
797.8ms

用总和创建一个新列:
181.3ms
170.7ms
172.2ms

4

4 回答 4

6

您可以尝试对数据库进行非规范化,并将部分总和直接保存在products表中。简单的查询whereandlimit将立即为您返回正确的答案。

您需要创建额外的过滤器,每当添加产品时都会更新单个记录,并且每当产品被删除或它的points字段被更改时都会更新所有产品。

于 2012-11-17T12:33:11.800 回答
1

事实证明,实际上有一种方法可以在 SQL 中执行此操作。首先,让我们设置一些测试环境:

rails new foobar
cd foobar
rails g model Product name:string points:integer
rake db:migrate
rails console

在 Rails 控制台中,向数据库提供一些记录:

Product.new(name: 'Foo',  points: 1).save!
Product.new(name: 'Bar',  points: 2).save!
Product.new(name: 'Baz',  points: 3).save!
Product.new(name: 'Baf',  points: 4).save!
Product.new(name: 'Quux', points: 5).save!

Now i found a way of getting running totals in SQL in this post here. It works like this:

query = <<-SQL
  SELECT *, (
    SELECT SUM(points)
    FROM products
    WHERE id <= p.id
  ) AS total_points
  FROM products p
SQL

Running this query against the test DB gives us:

Product.find_by_sql(query).each do |p|
  puts p.name.ljust(5) + p.points.to_s.rjust(2) + p.total_points.to_s.rjust(3)
end

# Foo   1  1
# Bar   2  3
# Baz   3  6
# Baf   4 10
# Quux  5 15

So we can now use a HAVING clause (and a GROUP BY because this is needed for HAVING)to fetch only the products that match the condition and LIMIT the number of results to one:

query = <<-SQL
  SELECT *, (
    SELECT SUM(points)
    FROM products
    WHERE id <= p.id
  ) AS total_points
  FROM products p
  GROUP BY p.id
  HAVING total_points >= #{find_point_level}
  LIMIT 1
SQL

I'm really curious how this performs in your environment with many many records. Give it a try and tell me if it works for you, if you like.

于 2012-11-17T13:03:32.140 回答
0
  • this does not really solve the problem, but you can use find_each instead of each to load products in batches instead of loading all the table. see the guides

EDIT ignore the following, i forgot that window functions do not permit WHERE and HAVING clauses

  • if you are willing to use a non db-agnostic solution, you can use this (not tested):

    query = <<-SQL
      SELECT id, SUM(points) OVER (ORDER BY id) AS total_points
      FROM products
      HAVING total_points >= 100000
      LIMIT 1
    SQL
    
    @product = Product.find_all_by_sql( query )
    

this uses window functions that are NOT supported by all RDBMS (Postgresql does). Beware, once you have retrieved the @product, it will be a readonly record with only two attributes accessible: id and total_points

于 2012-11-17T13:18:16.460 回答
-2

如果表很大,你可以使用普通的 sql 查询:

find_point_level = 100000
Product.find_all_by_sql("SELECT SUM(points) FROM (SELECT points FROM products ORDER BY id LIMIT #{find_point_level}) AS subquery")

列索引也应该是数据库中存在的索引。

于 2012-11-17T12:52:51.000 回答