mongodb - How can two Amazon EC2 instances share the same data?

Question

I am working on a solution deployed on an Amazon EC2 server instance that has its region set to US WEST. The solution uses mongodb for data storage and contains a web service that is used by a mobile application. The user base of the mobile application is split 40:60 between US and Asia, as such I need to set up another EC2 instance in the Asia Pacific region to lower their latency and connection time.

Since the data storage is located on an instance in US WEST, how would I set up a new instance in asia pacific that can share the same data with the US WEST instance? I am open to moving the mongodb databse elsewhere but I do not want to change to a different NoSQL soltion.

score 4 · Accepted Answer

There are various different solutions here. I will try to provide a few.

Replica Sets

Perhaps the easiest solution would be to use a Replica Set, where you have two servers in US-EAST and one in ASIA. Replica Sets in MongoDB require a minimum of three nodes to work and as you have a higher amount of users near US-EAST it makes sense to put it there.

Now, with just the three nodes you only solve having the data available closer to ASIA with one of the nodes. You then need to use Read Preferences, to instruct your application to either read from one of the US-EAST or ASIA nodes. I have written an article about how PHP deals with those Read Preferences at http://derickrethans.nl/readpreferences.html — other language drivers will have a similar solution.

All drivers will maintain connections to each of the Replica Set nodes, so connection overhead should not be too much of a problem. But at least you can do reads from a node closest by to solve latency. Writes still always have to go to a primary (which will likely be in US-EAST).

Pros: Fairly easy to set-up, only three nodes required
Cons: Only good for directing reads, but not writes

Sharding

Sharding is a method in MongoDB that allows you to separate your whole data set into smaller piece so that is possible to fit a huge dataset into MongoDB, without having the constraints of the resources of one server. Typically, a sharded set-up consists of at least two shard, each containing a (3 node) replica set, but it also possible to have a replicaset consist of only one node which means you'd end up having two shards, each containing one data node.

Sharding in MongoDB supports "Tag Aware Sharding" (http://mongodb.onconfluence.com/display/DOCS/Tag+Aware+Sharding) which makes it possible to redirect specific documents to specific shards depending on a field in your document. If your documents f.e. have a range of user ideas, or country codes, you can use that to redirect documents to the correct shard.

Setting this up is however not very easy as it requires quite a good understanding of sharding with MongoDB. There is a really nice introduction at http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/

Pros: Allows you to have data localized to one specific location for both read and write
Cons: Not easy to setup, you need two data nodes, config servers and proxy servers.

Hopes this helps!

score 1 · Accepted Answer

1

If your application is read heavy I would use mongodb's Replica sets:

于 2013-01-31T14:46:58.637 回答

mongodb - How can two Amazon EC2 instances share the same data?

2 回答 2

Related

Reference