Let me summarize what I understood of your requirements:
- you have a data tree where each single node can be altered asynchronously by a set of operations (
capitalize_d
and remove_a
in your example),
- you want to make sure every single node has been subjected to a given operation before allowing the next one.
I have spent 10 years or more designing real-time embedded software, and believe me, the requirements in this area are meaner and scarrier than anything most network programmers will experience in their entire life. This makes me warn you that you seem to be headed the seriously wrong way here.
As I can imagine it, your problem is to organize a set of individual data in some sort of meaningful structure. Some process collects random bits of information (what you call 'nodes' in your example), and at some point you want to put all those nodes into a consistent, monolithic data structure (a hierarchical tree in your example).
In other words, you have three tasks at hand:
- a data acquisition process that will collect nodes asynchronously
- a data production process that will present a consistent, refined data tree
- a controller process that will synchronize data acquisition and production (possibly directly the user interface if the two above processes are smart enough, but don't count on it too much).
My advice: don't try to do acquisition and production at the same time.
Just to give you an idea of the nightmare you're headed for:
depending on how the operations are triggered, there is a possibility that the tree will never be completely processed by a given operation. Let's say the controlling software forgets to call capitalize_d
on a few nodes, remove_a
will simply never get the green light
conversely, if you fire at the tree at random, it is very likely that some nodes will get processed multiple times, unless you keep track of the operation coverage to prevent applying the same transformation twice to a given node
if you want remove_a
processing to ever start, you might have to prevent the controlling software from sending any more capitalize_d
requests, or else the light might stay stuck to red forever. You will end up doing flow control over your requests one way or another (or worse : you will not do any, and your system will be likely to freeze to death should the operation flow wander away from the sweet spot you happened to hit by chance).
if an operation alters the structure of the tree (as remove_a
obviously does), you have to prevent concurrent accesses. At the very least, you should lock the subtree starting from the node remove_a
is working on, or else you would allow processing a subtree that is likely to be asynchronously altered and/or destroyed.
Well it's doable. I've seen fine youg men earning big money doing variations on this theme. They usually spent a couple of evenings each week eating pizzas in front of their computers too, but hey, that's how you can tell hardboiled hackers from the crowd of quiche eaters, right?...
I assume you being posting this question here means you don't really want to do this. Now if you boss does, well, to quote a famous android, I can't lie to you about your chances, but... you have my sympathies.
Now seriously folks.. This is the way I would tackle the problem.
1) take a snapshot of your data at a given point in time
you can weed out raw data using as many criteria as you can (last data acquisition too old, incorrect input, whatever allows you to build the smallest possible tree).
2) build the tree with the snapshot, and then apply whatever capitalize_d, remove_a and camelize_z operations sequentially on this given snapshot.
In parallel, the data acquisition process will continue collecting new nodes or updating existing ones, ready to take the next snapshot.
Besides, you can move some of your processing forward. Obviously capitalize_d
does not take any advantage of the tree structure, so you can apply capitalize_d
to each node in the snapshot before the tree is even built. You might even be able to apply some of the transformations earlier, i.e. on each collected sample.
This can save you a lot of processing time and code complexity.
To end with a bit of theroretical babble,
- your approach is to consider the data tree a shared object that should support concurrent acces from the data acquisition and data production processes,
- my approach is to make the data acquisition process serve (asynchronously) consistent sets of data to a data production process, which can then handle sequentially the said set of data.
the data production process could be triggered on demand (say when the end user clicks the "show me something" button), in which case the reactivity would be rather poor: the user would be stuck watching an hourglass or whatever Web2.0 sexy spinning wheel for the time needed to build and process the tree (lets's say 7-8 seconds).
or you could activate the data production process periodically (feed it with a new snapshot every 10 seconds, a period safely above the mean processing time of a data set). The "show me something" button would then just present the last set of completed data. An immediate answer, but with data that could be 10 seconds older than the last recieved samples.
I rarely saw cases where this was not considered acceptable, especially when you produce a bunch of complex data an operator needs a few dozen seconds to digest.
In theory, my approach will lose some reactivity, since the data processed will be slightly outdated, but the concurrent access approach would probably result in an even slower software (and most certainly 5-10 times bigger and buggier).