I have n (typically n < 10 but it should scale) processes running on different machines and communicating through amqp using RabbitMQ. Processes are typically long running and may be implemented in any language (though most are java/python).
Each process requires a number of inputs (numbers/strings) and produces a number of outputs (also just numbers or strings). Executing a process happens asynchronously: sending a message on its input queue and waiting for a callback to be triggered by the output queue.
Ideally the user specifies some inputs and desired outputs and the system should:
- detect which processes are needed and generate the dependency graph
- topologically sort the graph and execute it, node transitions will need to be event driven
A node should fire if its input is ready, allowing parallelism per branch. I can assume no cycles for now, but eventually there will be cycles (e.g., two processes may need to iterate until the output no longer changes).
This should be a known problem from (data)flow programming (discussed here before) and I want to avoid re-inventing the wheel. I would prefer a python solution and a search leads to Trellis and Pypes. Trellis is no longer developed but seems to support cycles, while pypes does not. Also not sure how actively developed pypes is.
Further searches reveal a whole list of event based programming frameworks, none of which I am particularly knowledgeable about. There are of course workflow environments like Taverna and KNIME, but that seems overkill.
Does anybody have any experience tackling this type of problem or with the libraries mentioned?
Edit: Other libraries I found are: