mysql - 优化代码以在大表中填充新列

Question

我即将在一个有 37M 行的表中添加一个新列。该列将包含一个关联 ID。

简单模型：

class SeenEpisode < ActiveRecord::Base
  #show_id is the new column
  attr_accessible :user_id, :season_id, :episode_id, :show_id
  belongs_to :episode
  belongs_to :season
end

这是我能想到的最快方法：

seen_episodes = SeenEpisode.where("show_id IS NULL")
seen_episodes.find_in_batches do |batch| #batch size is 1000
  batch.group_by(&:season_id).each do |season_id, seen_episodes|
    #all seen_episodes with the same season_id, ensures the same show_id
    show_id = seen_episodes.first.episode.show_id
    seen_episodes.each do |seen_episode|
      seen_episode.update_column(:show_id, show_id) #skip validations and callbacks
    end
  end
end

当前的开发测试表明，填充 10.000 条记录大约需要 2 分钟。
假设生产需要 1 分钟，由于更好的硬件和 mysql 配置，每百万条记录仍需要 100 分钟。这就像60个小时。

有没有可能有更快的方法来解决这个问题？

score 3 · Accepted Answer

如果你批量写入，它会快几个数量级。我的意思是，而不是发送单独的写入

update episodes set show_id = 1 where episode_id = 1;
update episodes set show_id = 1 where episode_id = 2;
update episodes set show_id = 1 where episode_id = 3;

您应该将它们分组为单个写入

update episodes set show_id = 1 where episode_id in (1, 2, 3);

或者，这样的事情可以工作：

select season_id, show_id 
from episodes 
where show_id is not null 
group by season_id;

那应该show_id为每个season_id. 然后只需遍历这些行并触发大规模更新（为简单起见，SQL 语法，您可能会在 ruby 中执行此操作）

update episodes set show_id = @show_id where season_id = @season_id;

mysql - 优化代码以在大表中填充新列

1 回答 1

Related

Reference