sql - 如何优化/缓存 Rails 查询

Question

我有 Rails 应用程序，其中查询需要很长时间。它使用 postgresql 数据库，查询由一张包含数千条记录的表组成。

统计控制器.rb

all_data = Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]) 
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))
    .in_range( @from_date, @to_date)
if @user
  all_data = all_data.by_user(@user)
end

all_data = all_data.to_a #Forcing to make query 
@data = all_data = all_data.to_a

我想做的是将主要查询结果（没有 in_range 和 user 语句）保留在 Rails 服务器端应用程序缓存中，并每小时更新一次数据。

应该缓存的部分代码：

Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]) 
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0"))

缓存记录的使用

除了该客户端，还可以从日历@from_date 和@to_date 中选择日期范围。日期之间的时间段可以是 1 天...~3 年。（这就是为什么缓存应该存储数据库表中的所有记录。）数据用于绘制图表并显示/计算顶级用户统计信息。

我试过@MrTheWalrus 解决方案

@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
  Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id])
    .where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end

但是这样我就不能让我的子查询工作：

all_data = @statistics.in_range( @from_date, @to_date)
if @user
  all_data = all_data.by_user(@user)
end

这给了我一个错误：

undefined method `in_range' for #<Array:0x007fa5ecc77588>

尽管我在 Usagedata 模型中定义了 in_range，如下所示：

def self.in_range(from_date, to_date)
  where("start_time <= :to AND end_time >= :from", :from => from_date, :to => to_date)
end

我做错了什么？

编辑：感谢@Craig Ringer 解决方案，我设法解决了此处描述的索引问题：

整个应用程序似乎真的很慢。我究竟做错了什么？可能我也需要添加索引，但是如何添加？

  Usagedata Load (243.4ms)  SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')EXPLAIN (1.9ms)  EXPLAIN SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata" WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')

EXPLAIN for: SELECT start_time, end_time, node_count, processors, id, wall_duration, local_user_id FROM "usagedata"  WHERE ("usagedata"."wall_duration" != 0) AND (start_time <= '2013-09-02 20:59:59.999999' AND end_time >= '2013-05-05 21:00:00.000000')
QUERY PLAN
 ---------------------------------------------------------------------------------------
 Seq Scan on usagedata  (cost=0.00..4558.02 rows=7989 width=34)
 Filter: ((wall_duration <> 0) AND (start_time <= '2013-09-02 20:59:59.999999'::timestamp without time zone) AND (end_time >= '2013-05-05 21:00:00'::timestamp without time zone))
 (2 rows)

score 2 · Accepted Answer

Craig Ringer 的评论已经讨论过索引，所以我只想谈谈缓存。

您包含的缓存代码的问题在于您正在缓存的内容是ActiveRecord::Relation- 基本上只是一个等待运行的 SQL 查询，而不是该查询的结果。缓存关系意味着每次从缓存中加载，仍然要执行查询，这是耗时较长的部分。在末尾添加一个.all以强制查询实际运行 - 这将确保结果被缓存，而不是查询：

@statistics = Rails.cache.fetch('usagedata', :expires_in => 24.hours) do
  Usagedata.select([:start_time, :end_time, :node_count, :processors, :id, :wall_duration, :local_user_id]).
    where(Usagedata.arel_table[(:wall_duration)].not_eq("0")).all
end

编辑： 您不能调用它的原因.in_range是.in_range修改了查询（通过添加WHERE子句）。一旦你运行了查询并缓存了结果，你就不能以这种方式修改它——缓存查询结果的全部意义在于你运行一次查询并多次使用结果——如果查询发生变化，那不是一个选项。

假设添加索引还没有解决您的问题，我的建议是您在 Ruby 中过滤结果，而不是在数据库中。假设您已经填充了缓存（通过无论何时或以其他方式）：

from_time = 1.week.ago
to_time = 1.day.ago
@statistics = Rails.cache.fetch('usagedata')
@filtered_statistics = @statistics.select do |item|
  item.start_time < to_time && item.end_time > from_time
end

score 1 · Accepted Answer

带有索引过滤子句的( start_time, ) 上的部分索引将使此查询更快。甚至是.end_timeWHERE ("usagedata"."wall_duration" != 0)(start_time, end_time)

这可能会使客户端缓存变得不必要。如果没有，请查看 Rails 是否支持创建和管理服务器端物化视图。

sql - 如何优化/缓存 Rails 查询

2 回答 2

Related

Reference