0

我必须在 Rails 中一次将 25000 条记录添加到数据库中。我也必须验证它们。

这是我现在拥有的:

  # controller create action
  def create
    emails = params[:emails][:list].split("\r\n")
    @created_count = 0
    @rejected_count = 0

    inserts = []
    emails.each do |email|
      @email = Email.new(:email => email)
      if @email.valid?
        @created_count += 1
        inserts.push "('#{email}', '#{Date.today}', '#{Date.today}')"
      else
        @rejected_count += 1
      end
    end
    return if emails.empty?
    sql = "INSERT INTO `emails` (`email`, `updated_at`, `created_at`) VALUES #{inserts.join(", ")}"
    Email.connection.execute(sql) unless inserts.empty?
    redirect_to new_email_path, :notice => "Successfuly created #{@created_count} emails, rejected #{@rejected_count}"
  end

现在很慢,由于超时,无法添加这么多的记录。

有任何想法吗?我正在使用mysql。

4

3 回答 3

2

想到三件事:

  1. 您可以使用适当的工具帮助自己,例如: zdennis/activerecord-importjsuchal/activerecord-fast-import。问题在于,您的示例还将创建 25000 个对象。如果您告诉 activerecord-import 不使用验证,它不会创建新对象(activerecord-import/wiki/Benchmarks
  2. 将数万行导入关系数据库永远不会超快,它应该通过后台进程异步完成。还有一些工具,比如 DelayedJob 等等:https ://www.ruby-toolbox.com/
  3. 将属于模型的代码移出控制器(TM)

之后,您需要重新考虑这部分应用程序的流程。如果您在控制器操作中使用后台处理,例如create,您不能只是简单地返回HTTP 201HTTP 200。您需要做的是返回 "quick" HTTP 202 Accepted,并提供指向另一个表示的链接,用户可以在其中检查他们的请求状态(我们是否已经有成功响应?有多少电子邮件失败?),因为它正在处理中在后台。这听起来有点复杂,这是一个迹象,表明你可能不应该那样做。为什么你必须在一个请求中添加 25000 条记录?有什么背景?

于 2012-05-02T19:58:12.310 回答
0

If speed is your concern, I'd attack the problem from a different angle.

  • Create a table that copies the structure of your emails table; let it be emails_copy. Don't copy indexes and constraints.
  • Import the 25k records into it using your database's fast import tools. Consult your DB docs or see e.g. this answer for MySQL. You will have to prepare the input file, but it's way faster to do — I suppose you already have the data in some text or tabular form.
  • Create indexes and constraints for emails_copy to mimic emails table. Constraint violations, if any, will surface; fix them.
  • Validate the data inside the table. It may take a few raw SQL statements to check for severe errors. You don't have to validate emails for anything but very simple format anyway. Maybe all your validation could be done against the text you'll use for import.
  • insert into emails select * from emails_copy to put the emails into the production table. Well, you might play a bit with it to get autoincrement IDs right.
  • Once you're positive that the process succeeded, drop table emails_copy.
于 2012-05-02T20:28:24.153 回答
0

为什么不为工作创建一个 rake 任务?以下链接很好地解释了它。

http://www.ultrasaurus.com/sarahblog/2009/12/creating-a-custom-rake-task/

简而言之,一旦你编写了你的​​ rake 任务,你可以通过以下方式开始工作:

耙子成员:load_emails

于 2012-05-02T19:57:44.133 回答