5

Situation

I'm having trouble coming up with a good way to do a certain MongoDb query. First, here's what kind of query I want to do. Assume a simple database which logs entry and exit events (and possibly other actions, doesn't matter) by electronic card swipe. So there's a collection called swipelog with simple documents which look like this:

{
    _id: ObjectId("524ab4790a4c0e402200052c")
    name: "John Doe", 
    action: "entry",
    timestamp: ISODate("2013-10-01T1:32:12.112Z") 
}

Now I want to list names and their last entry times (and any other fields I may want, but example below uses just these two fields).

Current solution

Here is what I have now, as a "one-liner" for MongoDb JavaScript console:

db.swipelog.distinct('name')
           .forEach( function(name) {

    db.swipelog.find( { name: name, action:"entry" } )
               .sort( { $natural:-1 } )
               .limit(1)
               .forEach( function(entry) {

        printjson( [ entry.name, entry.timestamp ] )
    }) 
})

Which prints something like:

[ "John Doe", ISODate("2013-10-01T1:32:12.112Z")]
[ "Jane Deo", ISODate("2013-10-01T1:36:12.112Z")]
...

Question

I think above has the obvious scaling problem. If there are a hundred names, then 1+100 queries will be made to the database. So what is a good/correct way to get "last timestamp of every distinct name" ? Changing database structure or adding some collections is ok, if it makes this easier.

4

2 回答 2

16

您可以使用聚合框架来实现这一点:

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$group:
                  {_id:'$name',
                   first:
                         {$max:'$timestamp'}
                  }
             }
          ])

如果您可能在结果中包含其他字段,则可以使用$first运算符

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$sort:
                  {name:1, timestamp:-1}
             },
             {$group:
                  {_id:'$name',
                   timestamp: {$first:'$timestamp'},
                   otherField: {$first:'$otherField'},
                  }
             }
          ])
于 2013-10-01T13:22:36.537 回答
3

这个答案应该是对上述 attish 答案的评论,但我没有足够的代表在这里发表评论

请记住,聚合框架不能返回超过 16MB 的数据。如果您有大量用户,您的生产系统可能会遇到此限制。

MongoDB 2.6为聚合框架添加了新特性来处理这个问题:

  • db.collection.aggregateCursor()(临时名称) 与 相同,db.collection.aggregate()只是它返回的是游标而不是文档。这避免了 16MB 的限制
  • $out是一个新的管道阶段,它将管道的输出定向到集合。这允许您运行聚合作业
  • $sort已改进以消除其RAM 限制并提高速度

如果查询性能比数据年龄更重要,您可以安排一个常规aggregate命令,将其结果存储在集合中,例如db.last_swipe,然后让您的应用程序简单地查询db.last_swipe相关用户。

结论:我同意 attish 有正确的方法。但是,在当前的 MongoDB 版本上扩展它可能会遇到麻烦,应该考虑使用 Mongo 2.6。

于 2013-10-04T18:25:21.023 回答