3

我需要使用 fastercsv 读取日志文件(.CSV)并将其内容保存在数据库中(每个单元格值都是一条记录)。问题是每天需要读取大约 20-25 个日志文件,而且这些日志文件非常大(每个 CSV 文件超过 7Mb)。我已经分叉了阅读过程,因此用户不必等待很长时间,但仍然阅读 20-25 个该大小的文件需要时间(超过 2 小时)。现在我想分叉读取每个文件,即将创建大约 20-25 个子进程,我的问题是我可以这样做吗?如果是,它会影响性能吗?fastcsv 是否能够处理这个问题?前任:

for report in @reports
  pid = fork {
   .
   .
   .
   }
  Process.dispatch(pid)
end

PS:我正在使用 rails 3.0.7,它将发生在运行在亚马逊大型实例中的服务器上(7.5 GB 内存,4 个 EC2 计算单元(2 个虚拟内核,每个内核有 2 个 EC2 计算单元),850 GB 本地实例存储,64 位平台)

4

1 回答 1

1

If the storage is all local (and I'm not sure you can really say that if you're in the cloud), then forking isn't likely to provide a speedup because the slowest part of the operation is going to be disc I/O (unless you're doing serious computation on your data). Hitting the disc via several processes isn't going to speed that up at once, though I suppose if the disc had a big cache it might help a bit.

Also, 7MB of CSV data isn't really that much - you might get a better speedup if you found a quicker way to insert the data. Some databases provide a bulk load function, where you can load in formatted data directly, or you could turn each row into an INSERT and file that straight into the database. I don't know how you're doing it at the moment so these are just guesses.

Of course, having said all that, the only way to be sure is to try it!

于 2012-05-17T20:21:47.537 回答