synchronization - the way to synchronize state between distributed clients

Question

Here is my problem:

My application is a distributed real-time message broker for web applications. Clients from web-browsers connect to one of the application nodes. Those nodes connected by ZeroMQ PUB/SUB mechanism. If one client sends message - node publishes it into PUB socket, other nodes receive those message from SUB socket and send it to their own connected clients.

But now I need presence and history functionality. Presence - provide a list, containing description of all connected (to all nodes) clients. History - provide a list of last several messages sent. I.e. I need to get entire state of application. I consider several ways to achieve it:

1) Send all information about connected clients to central server. Then when a client asks for presence - ask central server and return response to client.

2) Keep all information on every node. When client connect to any node send information about it to other nodes - using PUBLISH operation. So when a client asks for presence I can immediately return a response.

3) Gather information on demand from all nodes. I really can’t imagine how to program this at moment but this allows to get rid of duplicating information that leads to reducing memory consuption. In this case I don’t need to worry about fitting all information in memory.

4) Use some distributed data store, something like Dooserd. But I don’t like this idea because of extra dependency.

Client needs presence information on every connect to the node, presence information changes on every client's connect/disconnect, history information changes on every message.

This is an open-source application, so I don't know how much connected clients it must support. Load tests in the end will say this number.

There is no strong requirement about reliability of those presence and history data.

I really need your advice, which of these options is the right way to solve my problem. Or maybe there is another better way?

score 2 · Accepted Answer

存在和历史数据很自然地按照其所属的频道进行划分。

那么，您是否考虑过跨应用服务器分发通道？每个应用程序节点可能有几个它知道的通道。有关其他频道的查询被发送到可以回答它们的特定节点。

这可能最接近您列表中的选项 3。

这样，每个频道的状态数据就变成了可管理的数据块，可能小到足以保存在内存中。历史数据可以缓存在内存中，也可以缓存在特定于频道的服务器上。使用某种驱逐算法来确定哪些历史数据不再值得缓存。它从内存中删除并准备好从存储中检索。

另一个供您考虑的想法：您知道0MQ 的集群哈希图协议吗？我认为您可以使用它（或受到它的启发）来推送有关客户端连接到的那些通道的存在和历史数据，而不是让它们从应用程序服务器中提取它。

编辑：我阅读了 CHP 协议，自从我阅读指南以来已经有一段时间了。

CHP 服务器将 hashmap 数据中的所有更改发布给所有订阅客户端。订阅者过滤数据。这就是订阅 0MQ 主题的方式，而不仅仅是 CHP。但是，如果服务器托管许多频道但客户通常只对少数几个频道感兴趣，那么您的客户可能需要大量咀嚼数据。

我想你已经面临这个问题，所以我想知道：你现在如何组织这个？

快照由客户端在加入时检索，并根据子树进行过滤。用户指南中有一些有趣的细节，说明如何在快照到达之前将已发布的更新保留在队列中，以及如何丢弃更新之前的消息。

所以我们会在客户端做同步，如下：

客户端首先订阅更新，然后发出状态请求。这保证了状态将比它拥有的最旧的更新更新。

客户端等待服务器回复状态，同时将所有更新排队。它只是通过不读取它们来做到这一点：ØMQ 让它们在套接字队列中排队。

当客户端收到其状态更新时，它会再次开始读取更新。但是，它会丢弃任何早于状态更新的更新。因此，如果状态更新包括最多 200 的更新，客户端将丢弃最多 201 的更新。

然后客户端将更新应用到它自己的状态快照。

我想你肯定会对这一点感兴趣。

score 2 · Accepted Answer

基本上，在某些情况下，您的所有选项都是有效的选项。

如果没有特定要求，我会选择最简单的解决方案。

我认为最简单的解决方案是使用 Redis 之类的东西。它很稳定，被许多公司使用（包括我所知的 SO），它非常快速且非常灵活，很容易实现历史记录的上限列表。迭代您的需求将非常容易，因为您可以快速更改功能。

如果您不想要额外的依赖/部署，另一种选择是在服务器之间分区信息（使用散列分区或一致散列），以便您知道在哪里存储/检索有关特定客户端或其他实体的信息。

高温高压

synchronization - the way to synchronize state between distributed clients

2 回答 2

Related

Reference