I am trying to figure out an elegant way to do the following in mongo / python: I have two collections: one containing a list of people and attributes and one containing a subset of people that is a 'population subset'. I want to run a map reduce job to calculate some aggregate stats on the large list but only using names of people that appear in the population sample. Here is an example set of records:
master_list: [{ Name: Jim }, { Age: 24}
{ Name: Bill}, { Age: 38}
{ Name: Mary}, { Age: 55}]
subset : [{ Name: Jim}
{ Name: Mary}]
The idea is to calculate an average of age but only using two of the three records in the master_list, as listed in subset. I am aware that map_reduce in mongo supports a query parameter, but not clear what the best way to deal with the above is given the no joining. One option is for me to preprocess master_list and create an attribute 'include' to flag which records to use, and then operate on that in map_reduce filter. Seems kludgy though and creates a permanent flag in my database which is annoying for various reasons.
UPDATE
After reading suggestions to embed list in query I was able to get what I needed with the below
map_reduce(mapper, reducer, out = {'merge': 'Stats'},
finalize = finalizer, scope = {'atts': f},
query = {'Name' : { '$in' : pop }})
Where pop is a python list of names. Thanks!