0

I have been struggling for a while with problems along the same lines - performing efficient queries in rails. I am currently trying to perform a query on a model with 500,000 records and then pull out some descriptive statistics regarding the results returned.

As an overview: I want to pull out a number of products which match a set of criteria. I would then like to...

  • Count the number of records (if there aren't any I want to supress certain actions)
  • Identify the max and min prices of the matching records and calculate the number of items falling between certain ranges

As it stands this set of commands takes a lot longer than I was hoping for (26000ms running locally on my desktop computer) and involves either 8 or 9 active record actions each of which take around 3000ms

Is there something I am doing wrongly to make this so slow to process? Any suggestions would be fantastic

The code in my controller is:

    filteredmatchingproducts = Allproduct.select("id, product_name, price")
    .where('product_name LIKE ? 
    OR (product_name LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?       AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?) 
    OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR (product_name LIKE ? AND product_name NOT LIKE ?) OR product_name LIKE ?', 
    '%Bike Box', '%Bike Bag%', '%Pannier%', '%Shopper%', '%Shoulder%', '%Shopping%', '%Backpack%' , '%Wheel Bag%', '%Bike sack%', '%Wheel cover%', '%Wheel case%', '%Bike case%', '%Wahoo%', '%Bicycle Travel Case%')
    .order('price ASC')

    @selected_products = filteredmatchingproducts.paginate(:page => params[:page])  

    @productsfound = filteredmatchingproducts.count
    @min_price = filteredmatchingproducts.first
    @max_price = filteredmatchingproducts.last

    @price_range = @max_price.price - @min_price.price

    @max_pricerange1 = @min_price.price + @price_range/4
    @max_pricerange2 = @min_price.price + @price_range/2
    @max_pricerange3 = @min_price.price + 3*@price_range/4
    @max_pricerange4 = @max_price.price 

    if @min_price == nil
    #don't do anything - just avoid error
    else

    @restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , @max_pricerange1).count
    @restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', @max_pricerange1 + 0.01 , @max_pricerange2).count
    @restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange2 + 0.01 , @max_pricerange3).count
    @restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange3 + 0.01 , @max_pricerange4).count
    end

EDIT For clarity, the fundamental question I have is - why does each of these queries need to be performed on the large Allproduct database, is there not a way to perform the latter queries on the result of the former ones (I.e. use filteredmatchingproducts itself not recalculate it for each query)? In other programming languages I am used to being able to remember variables and perform operations of those remembered values, rather than having to work them out again before performing the operations - is this not the mindset in Rails?

4

4 回答 4

2

There are one too many things that are wrong with the code snippet that you have shared. Most importantly perhaps, this is not a rails specific optimisation problem, but instead a database structure, and optimisation issue.

You are using 'like' queries, with ampersand (%) on both sides that result in linear search time in SQLLite, as no index can be applied. Ideally, you should not be applying searches using 'Like', but instead should have defined a product_categories table, which would have been reference in the AllProducts table as product_category_id and would have a index defined on it.

For initializing @products_found, @min_price, and @max_price variables, you can do the following:

filteredmatchingproductlist = filteredmatchingproducts.to_a
@productsfound = filteredmatchingproductlist.count
@min_price = filteredmatchingproductlist.first
@max_price = filteredmatchingproductlist.last

This will avoid having the separate queries triggered for them as you're performing these operations on an Array instead of ActiveRecord::Relation.

Since the results are sorted, you can apply good old binary search on filteredmatchingproductlist array, and calculate the counts to achieve the same result as the last four lines of your code:

@restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , @max_pricerange1).count
@restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', @max_pricerange1 + 0.01 , @max_pricerange2).count
@restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange2 + 0.01 , @max_pricerange3).count
@restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange3 + 0.01 , @max_pricerange4).count

Finally, it would be best to integrate a search engine such as Sphinx or Solr if you really need counts and full text searching. Check out http://pat.github.io/thinking-sphinx/searching.html as a reference for how to implement that.

于 2013-09-16T08:40:48.730 回答
0

What is the product_name field? It seems like you could use act_as_taggable gem (https://github.com/mbleigh/acts-as-taggable-on). LIKE statement causes database to check every single record for matches and it is quite heavy. When you have 500k records, it has to take a while.

于 2013-09-14T22:33:40.523 回答
0

If all you're dealing with are prices, you should go ahead and do so on an array of prices, rather than an ActiveRecord::Relation. So try something like:

filteredmatchingproducts = (...).map(&:price)

And then do all operations on that array. Also, try to load large requests in batches wherever possible, and then maintain your own counts, etc. if you can. This will avoid the application chewing up all the memory at once and slowing things down:

http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches

于 2013-09-14T23:05:53.433 回答
0

The reason it's executing so many queries is because you're asking it to execute a lot of queries. (Also all of the LIKEs tend to make things slow.) Here's your code with a comment added before each query that will be made (8 total).

filteredmatchingproducts = Allproduct.select("id, product_name, price")
.where('product_name LIKE ? 
OR (product_name LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?       AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?) 
OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR (product_name LIKE ? AND product_name NOT LIKE ?) OR product_name LIKE ?', 
'%Bike Box', '%Bike Bag%', '%Pannier%', '%Shopper%', '%Shoulder%', '%Shopping%', '%Backpack%' , '%Wheel Bag%', '%Bike sack%', '%Wheel cover%', '%Wheel case%', '%Bike case%', '%Wahoo%', '%Bicycle Travel Case%')
.order('price ASC')

#!!!! this is a query "select ... offset x, limit y"
@selected_products = filteredmatchingproducts.paginate(:page => params[:page])  

#!!!! this is a query "select count ..."
@productsfound = filteredmatchingproducts.count
#!!!! this is a query "select ... order id asc, limit 1"
@min_price = filteredmatchingproducts.first
#!!!! this is a query "select ... order id desc, limit 1"
@max_price = filteredmatchingproducts.last

@price_range = @max_price.price - @min_price.price

@max_pricerange1 = @min_price.price + @price_range/4
@max_pricerange2 = @min_price.price + @price_range/2
@max_pricerange3 = @min_price.price + 3*@price_range/4
@max_pricerange4 = @max_price.price 

if @min_price == nil
#don't do anything - just avoid error
else

#!!!! this is a query "select ... where price BETWEEN X and Y"
@restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , @max_pricerange1).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
@restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', @max_pricerange1 + 0.01 , @max_pricerange2).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
@restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange2 + 0.01 , @max_pricerange3).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
@restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?',  @max_pricerange3 + 0.01 , @max_pricerange4).count
end
于 2013-09-15T04:24:05.163 回答