I've built a similar system; it's called collective.transmogrifier
. One of these days I'll make it more generic (it is currently tied to the CMF, one of the underpinnings of Plone).
Decoupling
What you need, is a way to decouple the component registration for your pipeline. In Transmogrifier, I use the Zope Component Architucture (embodied in the zope.component
package). The ZCA lets me register components that implement a given interface and later look up those components as either a sequence or by name. There are other ways of doing this too, for example, python eggs have the concept of entry points.
The point is more that each component in the pipeline is referable by a text-only name, de-referenced at construction time. 3rd-party components can be slotted in for re-use by registering their own components independently from your pipeline package.
Configuration
Transmogrifier pipelines are configured using a textual format based on the python ConfigParser
module, where different components of the pipeline are named, configured, and slotted together. When constructing the pipeline, each section thus is given a configuration object. Sections don't have to look up configuration centrally, the are configured on instantiation.
Central state
I also pass in a central 'transmogrifier' instance, which represents the pipeline. If any component needs to share per-pipeline state (such as caching a database connection for re-use between components), they can do so on that central instance. So in my case, each section does have a reference to the central pipeline.
Individual components and behaviour
Transmogrifier pipeline components are generators, that consume elements from a preceding component in the pipeline, then yield the results of their own processing. Components generally thus have a reference to the previous stage, but have no knowledge of what consumes their output. I say 'generally' because in Transmogrifier some pipeline elements can produce elements from an external source instead of using a previous element.
If you do need to alter the behaviour of a pipeline component based on individual items to be processed, mark those items themselves with extra information for each component to discover. In Transmogrifier, items are dictionaries, and you can add extra keys to a dictionary that use the name of a component so each component can look for this extra info and alter behaviour as needed.
Summary
Decouple your pipeline components by using an indirect lookup of elements based on a configuration.
When you instantiate your components, configure them at the same time and give them what they need to do their job. That could include a central object to keep track of pipeline-specific state.
When running your pipeline, only pass through items to process, and let each component base it's behaviour on that individual item only.