I'm have trouble with slow insert performance on a sharded cluster. My setup consists of 5 shards and each shard has at least 3 replica set members. As far as network topology goes, one group of RS members is living in Rackspace Cloud, the rest are on AWS. Running 2.4.6 on all
I'm processing a file in Java and writing it to MongoDB. Each file is ~60MB and the resulting data for a file ends up as ~160MB in the DB. I'm connecting to a mongos from my Java application. I'm sharding on the hash of the _id (auto-generated ObjectID) and I have write concern set to UNACKNOWLEDGED.
If I write to an unsharded collection I can write the whole file in ~90 seconds. If I write to a sharded collection it's taking me ~20 minutes!
I've done some initial debugging so far:
- I've tried creating a new collection and writing to it
- I've tried disabling the balancer to ensure that there were no migrations slowing things down (I've confirmed that the balancer is disabled)
- Don't see anything strange going on in the mongos or mongod logs
Things I've noticed:
The primary node on the primary shard is sitting at almost a constant 80% write lock. The other primaries are hovering around 5% with occasional spikes to 30%. The secondaries are all sitting around 5% with occasional spikes to 15%
sh.status() shows even chunk distribution but db.collection.stats() shows that the primary shard has a count & size that's twice as big as the other four shards
No other noticeable errors in the logs or MMS
Any ideas on how I can further debug this issue?
Update with output from sh.status()
prod.collection
shard key: { "_id" : "hashed" }
chunks:
rs1 8
rs2 8
rs3 8
rs4 8
rs0 8
too many chunks to print, use verbose if you want to force print
And the output from collection.stats()
mongos> db.collection.stats()
{
"sharded" : true,
"ns" : "prod.collection",
"count" : 879837,
"numExtents" : 76,
"size" : 2210698416,
"storageSize" : 2653114368,
"totalIndexSize" : 73526768,
"indexSizes" : {
"_id_" : 31526656,
"_id_hashed" : 42000112
},
"avgObjSize" : 2512.6226971586784,
"nindexes" : 2,
"nchunks" : 20,
"shards" : {
"rs0" : {
"ns" : "prod.collection",
"count" : 300130,
"size" : 754047552,
"avgObjSize" : 2512.403131976144,
"storageSize" : 873058304,
"numExtents" : 17,
"nindexes" : 2,
"lastExtentSize" : 232005632,
"paddingFactor" : 1.0000000000001465,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 24037440,
"indexSizes" : {
"_id_" : 9753968,
"_id_hashed" : 14283472
},
"ok" : 1
},
"rs1" : {
"ns" : "prod.collection",
"count" : 139598,
"size" : 350820064,
"avgObjSize" : 2513.07371165776,
"storageSize" : 470589440,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 127299584,
"paddingFactor" : 1.000000000000052,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 11626272,
"indexSizes" : {
"_id_" : 5060944,
"_id_hashed" : 6565328
},
"ok" : 1
},
"rs2" : {
"ns" : "prod.collection",
"count" : 149987,
"size" : 376944272,
"avgObjSize" : 2513.179622233927,
"storageSize" : 470593536,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 127299584,
"paddingFactor" : 1.0000000000000484,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 12713680,
"indexSizes" : {
"_id_" : 5674144,
"_id_hashed" : 7039536
},
"ok" : 1
},
"rs3" : {
"ns" : "prod.collection",
"count" : 140235,
"size" : 352293776,
"avgObjSize" : 2512.167262095768,
"storageSize" : 377905152,
"numExtents" : 14,
"nindexes" : 2,
"lastExtentSize" : 104161280,
"paddingFactor" : 1.0000000000000422,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 11863376,
"indexSizes" : {
"_id_" : 5110000,
"_id_hashed" : 6753376
},
"ok" : 1
},
"rs4" : {
"ns" : "prod.collection",
"count" : 149887,
"size" : 376592752,
"avgObjSize" : 2512.5111050324576,
"storageSize" : 460967936,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124985344,
"paddingFactor" : 1.000000000000043,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 13286000,
"indexSizes" : {
"_id_" : 5927600,
"_id_hashed" : 7358400
},
"ok" : 1
}
},
"ok" : 1
}
Balancer status:
mongos> !sh.getBalancerState() && !sh.isBalancerRunning()
true