1

我是一个 nodejs 新手,想知道哪种方式更好地将大量行插入数据库。从表面上看,一次插入一个东西看起来更像是要走的路,因为我可以快速释放事件循环并服务其他请求。但是,这样的代码看起来很难理解。对于批量插入,我必须事先准备好数据,这意味着肯定会使用循环。由于事件循环忙于循环,这将导致在此期间服务的请求减少。

那么,首选方式是什么?我的分析正确吗?

4

2 回答 2

1

这里没有正确的答案。这取决于细节:为什么要插入大量行?多常?这只是一次性引导还是您的应用每 10 秒执行一次?可用的计算/IO 资源也很重要。您的应用程序是唯一使用数据库的应用程序,还是通过请求将其对其他用户拒绝服务?

如果没有详细信息,我的经验法则是具有较小并发限制的批量插入,例如触发最多 10 个插入,然后等到其中一个完成后再向数据库发送另一个插入命令。这遵循 的模型async.eachLimit。这就是浏览器处理对给定网站的并发请求的方式,它已被证明是一种合理的默认策略。

于 2013-08-30T18:52:39.397 回答
0

In general, loops on in-memory objects should be fast, very fast.

I know you're worried about blocking the CPU, but you should be considering the total amount of work to be done. Sending items one at time carries a lot of overhead. Each query to the DB has its own sequence of inner for loops that probably make your "batching" for loop look pretty small.

If you need to dump 1000 things in the DB, the minimum amount of work you can do is to run this all at once. If you make it 10 batches of 100 "things", you have to do all of the same work + you have to generate and track all of these requests.

So how often are you doing these bulk inserts? If this is a regular occurrence, you probably want to minimize the total amount of work and bulk insert everything at once.

The trade-off here is logging and retries. It's usually not enough to just perform some type of bulk insert and forget about it. The bulk insert is eventually going to fail (fully or partially) and you will need some type of logic for retries or consolidation.

If that's a concern, you probably want to manage the size of the bulk insert so that you can retry blocks intelligently.

于 2013-09-03T16:54:19.243 回答