I have been assigned to design software that shall triggerCatastrophicEvent() if X.X.X.X is not reachable.
Constraints:
- It has to be deployed in 2 separate servers to achieve availability.
- At any moment of time, any one of the software instances should only triggerCatastrophicEvent() since it is a catastrophic event!
OS: RHEL 8.2 Language: JAVA-8
I have been thinking about this for quite a while. But, I have some doubts regarding my system design.
Please find my initial design plan:-
- Have state.cfg file in NFS common mounted area which shall state which of the software is PRIME at that moment.
- Default any one of the software(s) as Prime and another as redundant when the software(s) starts.
- Have an individual status.file(NFS common area) which shall be written by respective software regarding their Most Recent Health every K seconds.
- At any moment let one instance check if another software status. file is not written since the Last X seconds, declare another instance to be dead, and make oneself prime while making another redundant by modifying the state.cfg
- Both config and status.file are available to each since they are mounted in each server so either of the software can read or write rach others file as necessary.
Doubts:
- In case the NFS issue occurs due to network/storage/etc then, this design fails.
- Kindly note I gave up the idea of sharing heartbeat through TCP socket/ Message Queue between two software since it shall also be impacted by network loss.
I am aware that designing a distributed system is a huge lesson still I gave it a go, but it seems my idea is falling short. strong text
I would be highly obliged if the community can provide me with some leads or any idea.