ruby-on-rails-3 - 多次fork会影响性能吗？

Question

我需要使用 fastercsv 读取日志文件（.CSV）并将其内容保存在数据库中（每个单元格值都是一条记录）。问题是每天需要读取大约 20-25 个日志文件，而且这些日志文件非常大（每个 CSV 文件超过 7Mb）。我已经分叉了阅读过程，因此用户不必等待很长时间，但仍然阅读 20-25 个该大小的文件需要时间（超过 2 小时）。现在我想分叉读取每个文件，即将创建大约 20-25 个子进程，我的问题是我可以这样做吗？如果是，它会影响性能吗？fastcsv 是否能够处理这个问题？前任：

for report in @reports
  pid = fork {
   .
   .
   .
   }
  Process.dispatch(pid)
end

PS：我正在使用 rails 3.0.7，它将发生在运行在亚马逊大型实例中的服务器上（7.5 GB 内存，4 个 EC2 计算单元（2 个虚拟内核，每个内核有 2 个 EC2 计算单元），850 GB 本地实例存储，64 位平台）

score 1 · Accepted Answer

If the storage is all local (and I'm not sure you can really say that if you're in the cloud), then forking isn't likely to provide a speedup because the slowest part of the operation is going to be disc I/O (unless you're doing serious computation on your data). Hitting the disc via several processes isn't going to speed that up at once, though I suppose if the disc had a big cache it might help a bit.

Also, 7MB of CSV data isn't really that much - you might get a better speedup if you found a quicker way to insert the data. Some databases provide a bulk load function, where you can load in formatted data directly, or you could turn each row into an INSERT and file that straight into the database. I don't know how you're doing it at the moment so these are just guesses.

Of course, having said all that, the only way to be sure is to try it!

ruby-on-rails-3 - 多次fork会影响性能吗？

1 回答 1

Related

Reference