如果您在关系数据库中执行此操作,您将不会逐行比较访问,而是使用聚合查询来查找重复访问(使用 SELECT ... GROUP BY),因此您应该在 MongoDB 中以相同的方式执行此操作。
首先,您需要汇总每个客户每天每个商店的访问量:
group1 = { "$group" : {
"_id" : {
"c" : "$clientId",
"l" : "$location",
"day" : {
"y" : {
"$year" : "$tov"
},
"m" : {
"$month" : "$tov"
},
"d" : {
"$dayOfMonth" : "$tov"
}
}
},
"visits" : {
"$sum" : 1
}
}
};
编辑因为您只想重复 DAYS 接下来您将按客户、按商店分组并计算该客户访问该商店有多少不同的 DAYS:
group2 = {"$group" :
{"_id" : {
"c" : "$_id.c",
"s" : "$_id.l"
},
"totalDays" : {
"$sum" : 1
}
} };
然后,您只想包含上述记录,其中同一客户在多天内多次访问同一商店:
match = { "$match" : { "totalDays" : { "$gt" : 1 } } };
这是一个示例数据集以及使用上述管道操作聚合的结果:
> db.visits.find({},{_id:0,purchases:0}).sort({location:1, clientId:1, tov:1})
{ "clientId" : 1, "location" : "l1", "tov" : ISODate("2013-01-01T20:00:00Z") }
{ "clientId" : 1, "location" : "l1", "tov" : ISODate("2013-01-01T21:00:00Z") }
{ "clientId" : 1, "location" : "l1", "tov" : ISODate("2013-01-03T20:00:00Z") }
{ "clientId" : 2, "location" : "l1", "tov" : ISODate("2013-01-01T21:00:00Z") }
{ "clientId" : 3, "location" : "l1", "tov" : ISODate("2013-01-01T21:00:00Z") }
{ "clientId" : 3, "location" : "l1", "tov" : ISODate("2013-01-02T21:00:00Z") }
{ "clientId" : 1, "location" : "l2", "tov" : ISODate("2013-01-01T23:00:00Z") }
{ "clientId" : 3, "location" : "l2", "tov" : ISODate("2013-01-02T21:00:00Z") }
{ "clientId" : 3, "location" : "l2", "tov" : ISODate("2013-01-02T21:00:00Z") }
{ "clientId" : 1, "location" : "l3", "tov" : ISODate("2013-01-03T20:00:00Z") }
{ "clientId" : 2, "location" : "l3", "tov" : ISODate("2013-01-04T20:00:00Z") }
{ "clientId" : 4, "location" : "l3", "tov" : ISODate("2013-01-04T20:00:00Z") }
{ "clientId" : 4, "location" : "l3", "tov" : ISODate("2013-01-04T21:00:00Z") }
{ "clientId" : 4, "location" : "l3", "tov" : ISODate("2013-01-04T22:00:00Z") }
> db.visits.aggregate(group1, group2, match)
{
"result" : [
{
"_id" : {
"c" : 3,
"s" : "l1"
},
"totalDays" : 2
},
{
"_id" : {
"c" : 1,
"s" : "l1"
},
"totalDays" : 2
}
],
"ok" : 1
}