在这里,您find_in_batches
使用的来源find_each
:
http://apidock.com/rails/ActiveRecord/Batches/find_in_batches
单击显示源链接。基本线路是:
relation = relation.reorder(batch_order).limit(batch_size)
records = relation.where(table[primary_key].gteq(start)).all
和
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
您必须按主索引或其他唯一索引对记录进行排序,以便批量处理并选择下一批。您不能按批次进行,created_at
因为它不是唯一的。但是您可以混合排序created_at
和选择 unique id
:
relation = relation.reorder('created_at ASC, id ASC').limit(batch_size)
records = relation.where(table[primary_key].gteq(start)).all
#....
while records.any?
records_size = records.size
primary_key_offset = records.last.id
created_at_key = records.last.created_at
yield records
break if records_size < batch_size
if primary_key_offset
records = relation.where('created_at>:ca OR (created_at=:ca AND id>:id)',:ca=>created_at_key,:id=>primary_key_offset).to_a
else
raise "Primary key not included in the custom select clause"
end
end
如果您绝对确定没有具有相同created_at
值的记录将重复bach_size
多次,您可以将其created_at
用作批处理中的唯一键。
无论如何,您需要索引created_at
才能有效。