I've been browsing the net trying to find a solution that will allow us to generate unique IDs in a regionally distributed environment.
I looked at the following options (among others):
SNOWFLAKE (by Twitter)
- It seems like a great solutions, but I just don't like the added complexity of having to manage another software just to create IDs;
- It lacks documentation at this stage, so I don't think it will be a good investment;
- The nodes need to be able to communicate to one another using Zookeeper (what about latency / communication failure?)
UUID
- Just look at it: 550e8400-e29b-41d4-a716-446655440000;
- Its a 128 bit ID;
- There has been some known collisions (depending on the version I guess) see this post.
AUTOINCREMENT IN RELATIONAL DATABASE LIKE MYSQL
- This seems safe, but unfortunately, we are not using relational databases (scalability preferences);
- We could deploy a MySQL server for this like what Flickr does, but again, this introduces another point of failure / bottleneck. Also added complexity.
AUTOINCREMENT IN A NON-RELATIONAL DATABASE LIKE COUCHBASE
- This could work since we are using Couchbase as our database server, but;
- This will not work when we have more than one clusters in different regions, latency issues, network failures: At some point, IDs will collide depending on the amount of traffic;
MY PROPOSED SOLUTION (this is what I need help with)
Lets say that we have clusters consisting of 10 Couchbase Nodes and 10 Application nodes in 5 different regions (Africa, Europe, Asia, America and Oceania). This is to ensure that content is served from a location closest to the user (to boost speed) and to ensure redundancy in case of disasters etc.
Now, the task is to generate IDs that wont collide when the replication (and balancing) occurs and I think this can be achieved in 3 steps:
Step 1
All regions will be assigned integer IDs (unique identifiers):
- 1 - Africa;
- 2 - America;
- 3 - Asia;
- 4 - Europe;
- 5 - Ociania.
Step 2
Assign an ID to every Application node that is added to the cluster keeping in mind that there may be up to 99 999 servers in one cluster (even though I doubt: just as a safely precaution). This will look something like this (fake IPs):
- 00001 - 192.187.22.14
- 00002 - 164.254.58.22
- 00003 - 142.77.22.45
- and so forth.
Please note that all of these are in the same cluster, so that means you can have node 00001 per region.
Step 3
For every record inserted into the database, an incremented ID will be used to identify it, and this is how it will work:
Couchbase offers an increment feature that we can use to create IDs internally within the cluster. To ensure redundancy, 3 replicas will be created within the cluster. Since these are in the same place, I think it should be safe to assume that unless the whole cluster is down, one of the nodes responsible for this will be available, otherwise a number of replicas can be increased.
Bringing it all together
Say a user is signing up from Europe: The application node serving the request will grab the region code (4 in this case), get its own ID (say 00005) and then get an incremented ID (1) from Couchbase (from the same cluster).
We end up with 3 components: 4, 00005,1
. Now, to create an ID from this, we can just join these components into 4.00005.1
. To make it even better (I'm not too sure about this), we can concatenate (not add them up) the components to end up with: 4000051
.
In code, this will look something like this:
$id = '4'.'00005'.'1';
NB: Not $id = 4+00005+1;
.
Pros
- IDs look better than UUIDs;
- They seem unique enough. Even if a node in another region generated the same incremented ID and has the same node ID as the one above, we always have the region code to set them apart;
- They can still be stored as integers (probably Big Unsigned integers);
- It's all part of the architecture, no added complexities.
Cons
- No sorting (or is there)?
- This is where I need your input (most)
I know that every solution has flaws, and possibly more that what we see on the surface. Can you spot any issues with this whole approach?
Thank you in advance for your help :-)
EDIT
As @DaveRandom suggested, we can add the 4th step:
Step 4
We can just generate a random number and append it to the ID to prevent predictability. Effectively, you end up with something like this:
4000051357
instead of just 4000051
.