node.js - 通过 Node.js 从 MongoDB 返回大量项目

Question

我正在从 Node.js 中的 MongoDB 集合返回很多（500k+）个文档。它不是为了在网站上显示，而是为了数据一些数字处理。如果我抓住所有这些文件，系统就会冻结。有没有更好的方法来抓住这一切？

我在想分页可能有用吗？

编辑：这已经在主 node.js 服务器事件循环之外，因此“系统冻结”并不意味着“未处理传入请求”

score 2 · Accepted Answer

After learning more about your situation, I have some ideas:

Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.

score 2 · Accepted Answer

我会将您的大型 fetch+process 任务放在工作队列、后台进程或分叉机制上（这里有很多不同的选项）。

这样您就可以在主事件循环之外进行计算，并保持其自由以处理其他请求。虽然您应该在回调中进行 Mongo 查找，但计算本身可能会占用时间，因此“冻结”节点 - 您不会让它休息来处理其他请求。

score 1 · Accepted Answer

由于您不需要同时使用它们（这就是我从您询问分页中推断出的），也许最好将这些 500k 的东西分成更小的块以便在nextTick处理？

您还可以使用Kue之类的东西对块进行排队并稍后处理它们（因此不是同时处理所有内容）。

3 回答 3