node.js - MongoDB nodejs驱动程序不返回超过100000行

Question

这是复制我的问题的示例：

我用 100 万个这样的文档填充我的集合：

for(i=1; i<=1000000; i++){
if(i%3===0)
     db.numbers.insert({_id:i, stuff:"Some data", signUpDate: new Date()});
else
     db.numbers.insert({_id:i, stuff:"Some data"});
}

因此，每个第三个文档都有一个 signUpDate

我创建以下索引：

db.numbers.ensureIndex({"signUpDate" : 1});

然后，我有以下使用 nodejs 的非常小的应用程序：

var Db = require('mongodb').Db
, Connection = require('mongodb').Connection
, Server = require('mongodb').Server
, format = require('util').format;

var host = 'localhost';
var port = Connection.DEFAULT_PORT;

console.log("Connecting to " + host + ":" + port);

Db.connect(format("mongodb://%s:%s/test?w=1", host, port), function(err, db) {
        var collection = db.collection('numbers');

        collection.find({'signedUp': true}, {'_id':1}).limit(100000).toArray(function(err, docs){
                console.log(docs.length)
        });
});

这工作正常。

但是，如果我删除 .limit(100000)，服务器会坐在那里并且从不响应。

简而言之，我要做的就是返回一个_id列表，其中signUpDate不为空（应该有大约333,000）

我很确定问题出在 mongodb 缓存的方式上，但我不确定如何解决这个问题？

score 7 · Accepted Answer

您不应该调用toArray这样的大型结果集。相反，要么：

使用迭代结果each：

collection.find({'signedUp': true}, {'_id':1}).each(function(err, doc){
    if (doc) {
        console.log(doc);
    } else {
        console.log('All done!');
    }
});

或流式传输结果：

var stream = collection.find({'signedUp': true}, {'_id':1}).stream();
stream.on('data', function(doc) {
    console.log(doc);
});
stream.on('close', function() {
    console.log('All done!');
});

score 6 · Accepted Answer

您需要设置批量大小，然后流式传输或迭代结果，否则 mongo 驱动程序会将所有内容都粘贴到内存中。

还有{'_id':1}一股腥味，应该是{fields: {'_id' : 1}}

因此，您的结果将是：

collection.find({'signedUp': true}, {batchSize: 1000, fields: {'_id' : 1}}).each(function(err, item) { 
    do something with item
});

node.js - MongoDB nodejs驱动程序不返回超过100000行

2 回答 2

Related

Reference