我用了两台pc to db server,其中mongodb版本为2.4.1,系统架构如下:
server1.test.com(192.168.156.39) 的进程列表:
./mongodb/mongod --shardsvr --replSet shard1 --rest --port 27017 --dbpath ./mongodata/data --oplogSize 5000 --logpath ./mongodata/logs/shard1.log --logappend --fork
./mongodb/mongod --configsvr --port 30000 --dbpath ./mongodata/config --logpath ./mongodata/logs/config.log --logappend --fork
./mongodb/mongos --configdb server1.test.com:30000 --port 40000 --chunkSize 5 --logpath ./mongodata/logs/mongos.log --logappend --fork
server2.test.com(192.168.156.40) 的进程列表:
./mongodb/mongod --shardsvr --replSet shard1 --rest --port 27017 --dbpath ./mongodata/data --oplogSize 5000 --logpath ./mongodata/logs/shard1.log --logappend --fork
./mongodb/mongos --configdb server1.test.com:30000 --port 40000 --chunkSize 5 --logpath ./mongodata/logs/mongos.log --logappend --fork
分片列表:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("5170a7faa5daa853bde9fe11")
}
shards:
{ "_id" : "s1", "host" : "shard1/server1.test.com:27017,server2.test.com:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "crawler", "partitioned" : true, "primary" : "s1" }
正常运行了几天,但突然间,所有进程都自动退出,除了mongod
. 我检查了日志,显示如下:
server1 - shard1.log
Thu May 2 10:34:24.653 [initandlisten] connection accepted from 192.168.156.40:45113 #37770 (22 connections now open)
Thu May 2 10:34:36.748 [conn37623] command admin.$cmd command: { writebacklisten: ObjectId('5181c2ab7c46e60e12ace47b') } ntoreturn:1 keyUpdates:0 reslen:44 300000ms
Thu May 2 10:34:54.666 [conn37770] end connection 192.168.156.40:45113 (20 connections now open)
Thu May 2 10:34:54.667 [initandlisten] connection accepted from 192.168.156.40:45114 #37773 (22 connections now open)
Thu May 2 10:35:24.680 [conn37773] end connection 192.168.156.40:45114 (20 connections now open)
Thu May 2 10:35:24.681 [initandlisten] connection accepted from 192.168.156.40:45115 #37774 (22 connections now open)
Thu May 2 10:35:54.694 [conn37774] end connection 192.168.156.40:45115 (20 connections now open)
Thu May 2 10:35:54.694 [initandlisten] connection accepted from 192.168.156.40:45116 #37775 (22 connections now open)
server1 - config.log
Thu May 2 09:35:23.642 [initandlisten] connection accepted from 192.168.156.40:43971 #5 (5 connections now open)
Thu May 2 09:35:23.658 [initandlisten] connection accepted from 192.168.156.40:43978 #6 (6 connections now open)
Thu May 2 09:35:47.842 [conn1] update config.mongos query: { _id: "server1.test.com:40000" } update: { $set: { ping: new Date(1367458547741), up: 72, waiting: false, mongoVersion: "2.4.1" } } idhack:1 nupdated:1 fastmod:1 keyUpdates:0 locks(micros) w:46 101ms
Thu May 2 09:41:54.226 [initandlisten] connection accepted from 192.168.156.40:43992 #7 (7 connections now open)
Thu May 2 10:28:08.830 [initandlisten] connection accepted from 192.168.156.39:55161 #8 (8 connections now open)
Thu May 2 10:28:08.831 [conn8] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers
Thu May 2 10:29:35.218 [conn5] update config.locks query: { _id: "balancer", state: 0, ts: ObjectId('5181cf8c7c46e60e12ace6a1') } update: { $set: { state: 1, who: "server2.test.com:40000:1367458523:1804289383:Balancer:846930886", process: "server2.test.com:40000:1367458523:1804289383", when: new Date(1367461775087), why: "doing balance round", ts: ObjectId('5181cf8f5cc22604fbfdb9dd') } } nscanned:1 nupdated:1 fastmod:1 keyUpdates:0 locks(micros) w:238 120ms
server1 - mongos.log
Thu May 2 10:41:03.659 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' acquired, ts : 5181d23f7c46e60e12ace714
Thu May 2 10:41:03.661 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' unlocked.
Thu May 2 10:41:09.665 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' acquired, ts : 5181d2457c46e60e12ace715
Thu May 2 10:41:09.667 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' unlocked.
Thu May 2 10:41:15.671 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' acquired, ts : 5181d24b7c46e60e12ace716
Thu May 2 10:41:15.674 [Balancer] distributed lock 'balancer/server1.test.com:40000:1367458475:1804289383' unlocked.
server2 - shard1.log
Thu May 2 10:49:19.122 [initandlisten] connection accepted from 192.168.156.39:56166 #37872 (17 connections now open)
Thu May 2 10:49:36.763 [conn37634] command admin.$cmd command: { writebacklisten: ObjectId('5181c2ab7c46e60e12ace47b') } ntoreturn:1 keyUpdates:0 reslen:44 300000ms
Thu May 2 10:49:38.725 [conn37635] info DFM::findAll(): extent 0:3db000 was empty, skipping ahead. ns:crawler.crawl_js_url_queue
Thu May 2 10:49:49.138 [conn37872] end connection 192.168.156.39:56166 (15 connections now open)
Thu May 2 10:49:49.139 [initandlisten] connection accepted from 192.168.156.39:56167 #37873 (17 connections now open)
Thu May 2 10:50:17.666 [conn37635] info DFM::findAll(): extent 0:3db000 was empty, skipping ahead. ns:crawler.crawl_js_url_queue
Thu May 2 10:50:19.153 [conn37873] end connection 192.168.156.39:56167 (15 connections now open)
Thu May 2 10:50:19.154 [initandlisten] connection accepted from 192.168.156.39:56170 #37874 (17 connections now open)
server2 - mons.log
Thu May 2 10:56:42.921 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' unlocked.
Thu May 2 10:56:48.926 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' acquired, ts : 5181d5f0df728e935d741181
Thu May 2 10:56:48.929 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' unlocked.
Thu May 2 10:56:54.934 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' acquired, ts : 5181d5f6df728e935d741182
Thu May 2 10:56:54.937 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' unlocked.
Thu May 2 10:57:00.944 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' acquired, ts : 5181d5fcdf728e935d741183
Thu May 2 10:57:00.947 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' unlocked.
Thu May 2 10:57:06.952 [Balancer] distributed lock 'balancer/server2.test.com:40000:1367463294:1804289383' acquired, ts : 5181d602df728e935d741184
如果我删除server2上的数据,然后从server1同步数据,也许没问题,但是两个服务器的数据有差异,所以我不能这样做。有没有人有办法解决吗?