php - node-mongodb-native 的插入性能

Question

我正在使用 MongoDB 测试 Node.js 的性能。我知道这些中的每一个都很好，彼此独立，但我正在尝试一些测试来感受它们。我遇到了这个问题，我无法确定来源。

问题

我试图在单个 Node.js 程序中插入 1,000,000 条记录。 它绝对会爬行。 我们说的是 20 分钟的执行时间。无论是我的 Mac 还是 CentOS，都会发生这种情况，尽管两者的行为略有不同。它最终会完成。

效果类似于交换，但不是（内存永远不会超过 2 GB）。MongoDB 只打开了 3 个连接，而且大多数时候没有插入数据。它似乎做了很多上下文切换，并且 Node.js CPU 内核已被最大化。

效果类似于此线程中提到的效果。

我尝试使用 PHP 进行相同的操作，并在 2-3 分钟内完成。没有剧情。

为什么？

可能的原因

我目前认为这要么是 Node.js 套接字问题，要么是 libev 在幕后发生的事情，要么是其他一些 node-mongodb-native 问题。我可能完全错了，所以我在这里寻找一些指导。

至于其他 Node.js MongoDB 适配器，我尝试过蒙古语，它似乎对文档进行排队以便批量插入它们，但最终内存不足。所以就这样了。（旁注：我也不知道为什么会这样，因为它甚至没有接近我的 16 GB 盒子限制——但我并没有费心进一步调查。）

我可能应该提一下，我确实测试了一个有 4 个工作人员（在四核机器上）的主/工作集群，它在 2-3 分钟内完成。

编码

这是我的 Node.js CoffeeScript 程序：

mongodb = require "mongodb"
microtime = require "microtime"
crypto = require "crypto"

times = 1000000
server = new mongodb.Server "127.0.0.1", 27017
db = mongodb.Db "test", server
db.open (error, client) ->
  throw error if error?

  collection = mongodb.Collection client, "foo"

  for i in [0...times]
    console.log "Inserting #{i}..." if i % 100000 == 0

    hash = crypto.createHash "sha1"
    hash.update "" + microtime.now() + (Math.random() * 255 | 0)
    key = hash.digest "hex"

    doc =
      key: key,
      foo1: 1000,
      foo2: 1000,
      foo3: 1000,
      bar1: 2000,
      bar2: 2000,
      bar3: 2000,
      baz1: 3000,
      baz2: 3000,
      baz3: 3000

    collection.insert doc, safe: true, (error, response) ->
      console.log error.message if error

这是大致等效的 PHP 程序：

<?php
$mongo = new Mongo();
$collection = $mongo->test->foo;

$times = 1000000;
for ($i = 0; $i < $times; $i++) {
    if ($i % 100000 == 0) {
        print "Inserting $i...\n";
    }

    $doc = array(
        "key" => sha1(microtime(true) + rand(0, 255)),
        "foo1" => 1000,
        "foo2" => 1000,
        "foo3" => 1000,
        "bar1" => 2000,
        "bar2" => 2000,
        "bar3" => 2000,
        "baz1" => 3000,
        "baz2" => 3000,
        "baz3" => 3000
    );
    try {
        $collection->insert($doc, array("safe" => true));
    } catch (MongoCursorException $e) {
        print $e->getMessage() . "\n";
    }
}

score 2 · Accepted Answer

听起来您遇到了 V8 中的默认堆限制。我写了一篇关于消除此限制的博客文章。

垃圾收集器可能会发疯并咀嚼 CPU，因为它会不断执行，直到您低于 1.4GB 限制。

score 1 · Accepted Answer

如果在 db.open 回调函数的末尾显式返回一个值会发生什么？您生成的 javascript 代码正在将您的所有 collection.insert 返回推到一个大的“_results”数组中，我想这会越来越慢。

db.open(function(error, client) {
  var collection, doc, hash, i, key, _i, _results;
  if (error != null) {
    throw error;
  }
  collection = mongodb.Collection(client, "foo");
  _results = [];
  for (i = _i = 0; 0 <= times ? _i < times : _i > times; i = 0 <= times ? ++_i : --_i) {
    ...
    _results.push(collection.insert(doc, {
      safe: true
    }, function(error, response) {
      if (error) {
        return console.log(error.message);
      }
    }));
  }
  return _results;
});

尝试在你的咖啡脚本末尾添加这个：

    collection.insert doc, safe: true, (error, response) ->
      console.log error.message if error

  return

*更新：*所以，我实际上尝试运行您的程序，并注意到更多问题：

最大的问题是你试图以同步方式产生一百万个插入，这真的会杀死你的 RAM，并最终停止插入（至少，它对我来说是这样）。我在 800MB RAM 左右杀死了它。

您需要更改调用 collection.insert() 的方式，以便它异步工作。

我像这样重写了它，为了清楚起见，分解了几个函数：

mongodb = require "mongodb"
microtime = require "microtime"
crypto = require "crypto"

gen  = () ->
  hash = crypto.createHash "sha1"
  hash.update "" + microtime.now() + (Math.random() * 255 | 0)
  key = hash.digest "hex"

  key: key,
  foo1: 1000,
  foo2: 1000,
  foo3: 1000,
  bar1: 2000,
  bar2: 2000,
  bar3: 2000,
  baz1: 3000,
  baz2: 3000,
  baz3: 3000

times = 1000000
i = times

insertDocs = (collection) ->
  collection.insert gen(), {safe:true}, () ->
    console.log "Inserting #{times-i}..." if i % 100000 == 0
    if --i > 0
      insertDocs(collection)
    else
      process.exit 0
  return

server = new mongodb.Server "127.0.0.1", 27017
db = mongodb.Db "test", server
db.open (error, db) ->
  throw error if error?
  db.collection "foo", (err, collection) ->
    insertDocs(collection)
    return
  return

在约 3 分钟内完成：

wfreeman$ time coffee mongotest.coffee
Inserting 0...
Inserting 100000...
Inserting 200000...
Inserting 300000...
Inserting 400000...
Inserting 500000...
Inserting 600000...
Inserting 700000...
Inserting 800000...
Inserting 900000...

real    3m31.991s
user    1m55.211s
sys 0m23.420s

此外，它还具有使用 <100MB 的 RAM、70% 的节点上的 CPU 和 40% 的 mongod 上的 CPU 的附带好处（在 2 核机器上，所以看起来它并没有最大化 CPU）。

php - node-mongodb-native 的插入性能

2 回答 2

Related

Reference