3

This question is about how I can use indexes in MongoDB to look something up in nested documents, without having to index each individual sublevel. I have a collection "test" in MongoDB which basically goes something like this:

{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : {
        "1" : { [1,2,3] },
        "2" : { [4,5,6] }
}}

Scenario has multiple keys, each document can have any subset of the scenarios (i.e. from none to a subset to all). Also: Scenario can't be an array because i need it as a dictionary in Python. I created an index on the "scenario" field.
My issue is that i want to select on the collection, filtering for documents that have a certain value. So this works fine functionally:

db.test.find({"scenario.1": {$exists: true}})

However, it won't use any index i've put on scenario. Only if i put an index on the "scenario.1" an index is used. But I can have thousands (or more) scenarios (and the collection itself has 100.000s of records), so i would prefer not to!
So i tried alternatives:

db.test.find({"scenario": "1"}) 

This will use the index on scenario, but won't return results. Making scenario an array still gives the same index issue.

Is my question clear? Can anyone give a pointer on how I could achieve the best performance here?

P.s. I have seen this: How to Create a nested index in MongoDB? but that solution is not possible in my case (due to the amount of scenarios)

4

2 回答 2

5

Putting an index on a subobject like scenario is useless in this case as it would only be used when you're filtering on complete scenario objects rather than individual fields (think of it as a binary blob comparison).

You either need to add an index on each of your possible fields ("scenario.1", "sceanario.2", etc.) or rework your schema to get rid of the dynamic keys by doing something like this:

{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [
    { id: "1", value: [1,2,3] },
    { id: "2", value: [4,5,6] }
}}

Then you can add a single index to scenario.id to support the queries you need to perform.

I know you said you need scenario to be a dict and not an array, but I don't see how you have much choice.

于 2013-01-23T02:04:59.760 回答
2

Johnny HK's answer is a nice explained answer and should be used in general cases. I will just suggest a workaround for you to solve your issue if you have to have many scenarios and don't need complex querying. Instead of keeping values under scenario field, just hold the id of the scenario under that field, and hold the values as another field in the document and use the scenario id as the key of this field.

Example:

{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [ "1", "2"],
"scenario_1": [1,2,3],
"scenario_2": [4,5,6]
}}

With this schema you can use index on scenario to find specific scenarios. But if you need to query for specific scenario values, you again need to have an index on each scenario value field i.e scenario_1, scenario_2, etc.. If you need to have indexes for each field, then don't change your original schema and use sparse indexes for each nested field and that might help reduce the size of your indexes.

于 2013-01-23T07:50:53.290 回答