4

我有events2.502.011 个元素的集合,并希望对所有元素执行更新。不幸的是,由于写锁,我面临很多 mongodb 故障。

问题:我怎样才能避免这些错误,以确保我的所有事件都正确更新?

以下是有关我的活动收藏的信息:

> db.events.stats()
{
    "count" : 2502011,
    "size" : 2097762368,
    "avgObjSize" : 838.4305136947839,
    "storageSize" : 3219062784,
    "numExtents" : 21,
    "nindexes" : 6,
    "lastExtentSize" : 840650752,
    "paddingFactor" : 1.0000000000874294,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 1265898256,
    "indexSizes" : {
        "_id_" : 120350720,
        "destructured_created_at_1" : 387804032,
        "destructured_updated_at_1" : 419657728,
        "data.assigned_author_id_1" : 76053152,
        "emiting_class_1_data.assigned_author_id_1_data.user_id_1_data.id_1_event_type_1" : 185071936,
        "created_at_1" : 76960688
    }
}

这是一个事件的样子:

> db.events.findOne()
{
  "_id" : ObjectId("4fd5d4586107d93b47000065"),
  "created_at" : ISODate("2012-06-11T11:19:52Z"),
  "data" : {
    "project_id" : ObjectId("4fc3d2abc7cd1e0003000061"),
    "document_ids" : [
      "4fc3d2b45903ef000300007d",
      "4fc3d2b45903ef000300007e"
    ],
    "file_type" : "excel",
    "id" : ObjectId("4fd5d4586107d93b47000064")
  },
  "emiting_class" : "DocumentExport",
  "event_type" : "created",
  "updated_at" : ISODate("2013-07-31T08:52:48Z")
}

我想更新每个事件以在现有created_atupdated_at. 如果我错了,请纠正我,但是update当您需要访问当前的元素数据时,您似乎无法使用 mongo 命令。

这是我的更新循环:

db.events.find().forEach(
  function (e) {
    created_at = new Date(e.created_at);
    updated_at = new Date(e.updated_at);

    e.destructured_created_at = [e.created_at]; // omitted the actual values
    e.destructured_updated_at = [e.updated_at]; // omitted the actual values
    db.events.save(e);
  }
)

运行上述命令时,由于数据库上的写锁,我得到了大量的页面错误。

mongostat

4

1 回答 1

6

I think you are confused here, it is not the write lock causing that, it is MongoDB querying for your update documents; the lock does not exist during a page fault (in fact it only exists when actually updating, or rather saving, a document on the disk), it gives way to other operations.

The lock is more of a mutex in MongoDB.

Page faults on this size of data is perfectly normal, since you obviously do not query this data often, I am unsure what you are expecting to see. I am definitely unsure what you mean by your question:

Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?

Ok, the problem you may be seeing is that you are getting page thrashing on that machine in turn destroying your IO bandwidth and flooding your working set with data that is not needed. Do you really need to add this field to ALL documents eagerly, can it not be added on-demand by the application when that data is used again?

Another option is to do this in batches.

One feature you could make use of here is priority queues that dictate that such an update is a background task that shouldn't effect the current workings of your mongod too much. I hear such a feature is due (can't find JIRA :/).

Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.

You are correct.

于 2013-08-01T08:23:39.230 回答