I would like to note this JOIN cannot be used in realtime to your app and by doing it you are breaking MongoDB, however, yes; there is a way to map-reduce a JOIN.
In your first MR that gets the:
{_id: abc, city: "SF", state: CA, customfield1: value1...}
You just emit this row and write it to a new collection. Then in your second MR where you get:
{userId: abc, event: login, count:23, city: SF, state: CA}
You make userId
actually _id
:
var map = function(){
emit(this.userId, {this.event, //etc});
}
Or a compound key:
var map = function(){
emit({o: this.userId, e: this.event}, {this.event, //etc});
}
Then you reduce as normal but change the command, or rather call, to the server so that the out
option within the MR actually points to the result of your first MR adding a reduce
or merge
command on the out
option to make the two collections join on duplicate _id
s:
db.col.mapreduce( map, reduce, { out: {merge:collection_from_first_mr} })
That is basically how it works.
Going back to my first notice at the start of this answer. This is not SQL JOINs and they should not be treated as such. The JS engine is:
- Slow
- Single threaded
- Not actually MongoDB or Server-side, it is actually a built in JS engine to MongoDB
If the collection gets out of control or this command is run in realtime to your app you could easily see performance problems of other JavaScript (remember it is single threaded) that needs to run on your server does productive stuff.
Edit
so that I can query based on state or city which has max login events and similar kind of queries.
Wouldn't the login occur in that city though? So maybe the login row should contain a city and state field. This won't need updating and sound kind of odd that it would since that login would happen there, not anywhere else so:
I need to update literally all of the events collection which will be huge.
Becomes obsolete since the login event will not need updating because it happened in the state/city it was recorded in which is correct.
So I would actually go for a schema of:
{_id: uniqueEventId, event: login, userId: abc, state: '', city: ''}
And aggregate on that.