We have a MongoDB Cluster using GridFS. The fs.chunks table of gridfs is sharded over two replicasets. The usage of diskspace is very high. For 90GB of data we need more than 130GB of diskspace.
It seems like the fs.chunks table is needing the space. I did summarize the "length" field of fs.files showing the 90GB of space. The sum of the "size" field of both shards is 130GB. This is the real size of the payload data contained in the collection, right?
This means it has 40GB overhead? Is this correct? Where is it coming from? is it the BSON encoding? Is there a way to reduce overhead?
mongos> db.fs.chunks.stats()
{
"sharded" : true,
"ns" : "ub_datastore_preview.fs.chunks",
"count" : 1012180,
"numExtents" : 106,
"size" : 140515231376,
"storageSize" : 144448592944,
"totalIndexSize" : 99869840,
"indexSizes" : {
"_id_" : 43103872,
"files_id_1_n_1" : 56765968
},
"avgObjSize" : 138824.35078345748,
"nindexes" : 2,
"nchunks" : 2400,
"shards" : {
"ub_datastore_qa_group1" : {
"ns" : "ub_datastore_preview.fs.chunks",
"count" : 554087,
"size" : 69448405120,
"avgObjSize" : 125338.44887174758,
"storageSize" : 71364832800,
"numExtents" : 52,
"nindexes" : 2,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 55269760,
"indexSizes" : {
"_id_" : 23808512,
"files_id_1_n_1" : 31461248
},
"ok" : 1
},
"ub_datastore_qa_group2" : {
"ns" : "ub_datastore_preview.fs.chunks",
"count" : 458093,
"size" : 71066826256,
"avgObjSize" : 155136.2414531547,
"storageSize" : 73083760144,
"numExtents" : 54,
"nindexes" : 2,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 44600080,
"indexSizes" : {
"_id_" : 19295360,
"files_id_1_n_1" : 25304720
},
"ok" : 1
}
},
"ok" : 1
}