1

我正在尝试建立一个系统来生成一组“配置”。这些配置是存储在 python dict 中的简单键/值对。

这些配置是用一系列函数转换字典的结果,这就是我所说的工作流。

这是我最终得到的一个简单示例:

global_data = [dict()]

def workflow_step1(data):
    results = []
    for i in range(1,4):
        data['key'] = i
        results.append(copy.deepcopy(data))
    return results

def workflow_step2(data):
    results = []
    for i in range(1,3):
        data['otherkey'] = i
        results.append(copy.deepcopy(data))
    return results

def workflow_step3(data):
    data['yetanotherkey'] = 42
    return [copy.deepcopy(data)]

def list_workflow():
    return [workflow_step1, workflow_step2, workflow_step3]

def merge(lhs,rhs):
    return lhs+rhs

def run(data):
    for step in list_workflow():
        data = reduce(lambda lhs, rhs: lhs+rhs, [step(d) for d in data])
    return data

print run(global_data)

这很好用,我得到:

[{'yetanotherkey': 42, 'otherkey': 1, 'key': 1},
 {'yetanotherkey': 42, 'otherkey': 2, 'key': 1},
 {'yetanotherkey': 42, 'otherkey': 1, 'key': 2},
 {'yetanotherkey': 42, 'otherkey': 2, 'key': 2},
 {'yetanotherkey': 42, 'otherkey': 1, 'key': 3},
 {'yetanotherkey': 42, 'otherkey': 2, 'key': 3}]

如您所见,目标是获取字典的所有可能组合。工作流的每个步骤都会返回一组可能的组合,这应该为接下来的步骤创建一个新的可能性分支。

我面临的问题是,用户现在正在创建越来越多的工作流程步骤,从而导致组合爆炸。

我幼稚的设计中的问题是我一次生成了所有可能性的整个树。

我希望使用yield和生成器来解决这个问题,一次生成一种可能性,因此不会同时存储所有内容。

我当然能够使用 yield 重写工作流程步骤:

def workflow_step1(data):
    for i in range(1,4):
        data['key'] = i
        yield copy.deepcopy(data)

def workflow_step2(data):
    for i in range(1,3):
    data['otherkey'] = i
        yield copy.deepcopy(data)

def workflow_step3(data):
    data['yetanotherkey'] = 42
    yield copy.deepcopy(data)

def list_workflow():
    yield workflow_step1
    yield workflow_step2
    yield workflow_step3

但是我就是无法让我的大脑去想如何重写run函数来依次处理每个步骤。我迷失在产量和生成器的大脑迷宫中。

任何想法都更受欢迎!

4

3 回答 3

3

我认为itertools.product会做你想要的。这是一种返回生成器的方法,该生成器一次生成三个步骤的一个组合。即使一个步骤中有更多选项,也不会占用大量时间或内存。

def step1():
    return [("key", i) for i in range(1,4)]

def step2():
    return [("otherkey", i) for i in range(1,3)]

def step3():
    return [("yetanotherkey", 42)]

def workflow_generator():
    return (dict(p) for p in itertools.product(step1(), step2(), step3()))

如果您希望能够处理可变数量的步骤,您可以稍微修改一下以使其工作:

def workflow_generator(steps):
    return (dict(p) for p in itertools.product(*(step() for step in steps)))

调用这个版本workflow_generator([step1, step2, step3])会得到与前一个版本相同的结果,但如果你愿意,你可以用其他方式组合参数(比如从一个函数中)。

于 2012-11-02T20:35:23.313 回答
0

Yes, your datastructures are messed up. The following code is just to give an idea (not fully operational in terms of your current structures). You should use trees basically and make a class like a workflow manager which registers steps. Steps are trees of steps. Use true id's instead of numbers.

Two suggestions

1.

import copy

global_data = [dict()]

class workflowManager:

    def __init__(self):
        self.steps = []
        self.data = list()

    def registerStep(self,step,stepNumber=1):
        for i in range(1,stepNumber+1):
            self.steps.append(step)

    def registerSubStep(self,step,substep):


    '''
    def hookToStep(self,step,hook):
        #find all steps
        indices = [i for i, x in enumerate(self.steps) if x == step]
        print 'hooking at ',indices
        for k in indices:
            a = self.steps[:k]
            b = self.steps[k:]
            self.steps = a + [hook] + b
    '''

    def performOnData(self):
        print 'self.data ',self.data
        for step in self.steps:
            print 'performing step ',step
            print 'data ',self.data
            self.data = step(self.data)

    def __str(self):
        return str(data)

def step1(data):
    lastn = 0
    try:
        lastn = data[-1]['key']
    except:
        pass
    data.append({'key': lastn+1})
    return data

def step2(data):
    lastn = 0
    try:
        lastn = data[-1]['otherkey']
    except:
        pass

    data.append({'otherkey': lastn+1})
    return data

def step3(data):
    data.append({'yetanotherkey': 42})
    return data


w = workflowManager()
w.step_register(step1,4)
#w.step_register(step2,3)
#w.step_register(step3,1)
w.hookToStep(step1,step3)

print w

w.performOnData()

print w

2.

class Step:

    def __init__(self,name,extra=None):
        self.steps = []
        self.name = name

    def addChild(self,child,repeat=1):
        for j in range(1,repeat+1):
            self.steps.append(child)

    def __str__(self):
        s = self.name + "\n"
        for sub in self.steps:
            s+=str(sub)
        return s


step1 = Step("yetanotherkey",42) #root
step2 = Step("otherkey")
step3 = Step("key")

step2.addChild(step3,2)
step1.addChild(step2,3)

print step1
于 2012-11-02T20:44:17.553 回答
0

我建议您将循环从workflow_step函数中取出,并itertools.product像这样使用:

import copy
import itertools

def workflow_step1(data, param):
    data['key'] = param

def workflow_step2(data, param):
    data['otherkey'] = param

def workflow_step3(data, param):
    data['yetanotherkey'] = param

def list_workflow():
    return ([workflow_step1, workflow_step2, workflow_step3],
            [range(1,4),     range(1,3),     [42]])

def run(data):
    steps, param_lists = list_workflow()
    for params in itertools.product(*param_lists):
        d = copy.deepcopy(data)
        for step, param in zip(steps, params):
            step(d,param)
        yield d

for result in run({}):
    print result
于 2012-11-03T06:58:15.387 回答