鉴于以下事实,是否存在一个现有的开源 Java API(可能作为某些更大产品的一部分),它实现了一种算法,可以在集群环境中对事件进行可重现的排序:
1) There are N sources of events, each with a unique ID.
2) Each event produced has an ID/timestamp, which, together with
its source ID, makes it uniquely identifiable.
3) The ids can be used to sort the events.
4) There are M application servers receiving those events.
M is normally 3.
5) The events can arrive at any one or more of the application
servers, in no specific order.
6) The events are processed in batches.
7) The servers have to agree for each batch on the list of events
to process.
8) The event each have earliest and latest batch ID in which they
must be processed.
9) They must not be processed earlier, and are "failed" if they
cannot be processed before the deadline.
10) The batches are based on the real clock time. For example,
one batch per second.
11) The events of a batch are processed when 2 of the 3 servers
agree on the list of events to process for that batch (quorum).
12) The "third" server then has to wait until it possesses all the
required events before it can process that batch too.
13) Once an event was processed or failed, the source has to be
informed.
14) [EDIT] Events from one source must be processed (or failed) in
the order of their ID/timestamp, but there is no causality
between different sources.
不太正式,我有那些接收事件的服务器。它们从相同的初始状态开始,并且应该通过同意以何种顺序处理哪个事件来保持同步。对我来说幸运的是,这些事件不会尽快处理,而是“稍后”处理,这样我就有时间让服务器在截止日期前达成一致。但我不确定这是否真的会对算法产生任何真正的影响。如果所有服务器都同意所有批次,那么它们将始终保持同步,因此在查询时呈现一致的视图。
虽然我对 Java API 最满意,但如果我可以从 Java 调用它,我会接受其他东西。如果没有开源 API,但有一个清晰的算法,我也会把它作为答案并尝试自己实现它。