0

I have been assigned to design software that shall triggerCatastrophicEvent() if X.X.X.X is not reachable.
Constraints:

  1. It has to be deployed in 2 separate servers to achieve availability.
  2. At any moment of time, any one of the software instances should only triggerCatastrophicEvent() since it is a catastrophic event!
    OS: RHEL 8.2 Language: JAVA-8
    I have been thinking about this for quite a while. But, I have some doubts regarding my system design.
    Please find my initial design plan:-
  • Have state.cfg file in NFS common mounted area which shall state which of the software is PRIME at that moment.
  • Default any one of the software(s) as Prime and another as redundant when the software(s) starts.
  • Have an individual status.file(NFS common area) which shall be written by respective software regarding their Most Recent Health every K seconds.
  • At any moment let one instance check if another software status. file is not written since the Last X seconds, declare another instance to be dead, and make oneself prime while making another redundant by modifying the state.cfg
  • Both config and status.file are available to each since they are mounted in each server so either of the software can read or write rach others file as necessary.

Doubts:

  • In case the NFS issue occurs due to network/storage/etc then, this design fails.
  • Kindly note I gave up the idea of sharing heartbeat through TCP socket/ Message Queue between two software since it shall also be impacted by network loss.

I am aware that designing a distributed system is a huge lesson still I gave it a go, but it seems my idea is falling short. strong text

I would be highly obliged if the community can provide me with some leads or any idea.

4

0 回答 0