load-balancing - 如何实现高可用？

Question

我的老板想要一个系统来关注整个大陆的灾难性事件。他想在美国有两台服务器，在亚洲有两台服务器（每个大陆有一台登录服务器和一台工作服务器）。

如果地震破坏了两大洲之间的联系，两者都应该单独行动。当连接恢复时，它们应该相互同步恢复正常。
不允许使用外部云系统，因为他没有信心。
系统应考虑可扩展性，这意味着添加新服务器应易于配置。
服务器应该是负载平衡的。
服务器之间的连接应该非常安全（加密并通过 SSL 发送，尽管 SSL 负责加密）。
系统应该让一个且只有一个用户使用一个帐户登录。（注意大陆之间的延迟和共享帐户的两个用户可能同时到达两个登录服务器）

请帮忙。我已经走投无路了。先感谢您。

score 6 · Accepted Answer

I imagine that these requirements (if properly analysed) are essentially incompatible, in that they cannot work according to CAP Theorem.

If you have several datacentres, even if they are close by, partitions WILL happen. If a partition happens, either availability OR consistency MUST be lost, because either:

you have a pre-determined "master", which keeps working and other "slave" DCs which fail (or go readonly). This keeps consistency at the expense of availability.
OR you lose consistency for the duration of the partition (this means that operations which depend on immediate consistency are also unavailable).

This is incompatible with your requirements, as far as I can see. What your boss wants is clearly impossible. He needs to understand CAP theorem.

Now, in YOUR application case, you may decide that you can bend the rules and redefine what consistency or availiblity are, for convenience, and have a system which degrades into an inconsistent but temporarily acceptable state.

You probably want to get product management to have a look at the business case for these requirements. Dropping some of them is probably ok. Consistency is a good requirement to keep, as it makes things behave as people expect - this means to drop availability or partition-tolerance. Keeping consistency is definitely easier from an engineering perspective.

score 4 · Accepted Answer

这是雇主往往不了解使用现成解决方案的好处的另一件事。如果您作为程序员甚至不知道从哪里开始，那么自己动手可能会耗费大量金钱和时间。不知道这些东西也没有错。考虑到关键组件的灾难性故障的高可用性、故障安全网络是许多人投入大量精力和金钱的一个大问题领域。为什么不利用供应商提供的服务呢？

再次尝试与您的老板讨论使用现有的云提供商。

score 1 · Accepted Answer

您可以联系在全球不同地区拥有数据中心的可靠且经验丰富的托管服务提供商之一（我们使用 Rackspace），并根据您的要求获得他们的建议。

score 0 · Accepted Answer

这将需要专家的帮助和庞大的预算，以及认真的规划。

我更好的选择是联系具有全球影响力的信誉良好的供应商，并选择具有可靠 SLA 支持服务的优质解决方案，并让他们定制接近您需求的解决方案。

只要意识到即使是像谷歌、雅虎、微软和亚马逊（仅举几例）这样的人，也曾有过一些或其他的问题，导致系统的某些部分对某些用户脱机。

load-balancing - 如何实现高可用？

4 回答 4

Related

Reference