15

我正在尝试确定我的问题是否可以使用内置的 sorted() 函数解决,或者我是否需要自己解决 - 使用 cmp 的老学校会相对容易。

我的数据集如下所示:

x = [
('企业', Set('车队', '地址'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
……

排序规则应该基本上适用于 N & Y 的所有值,其中 Y > N,x[N][0] 不在 x[Y][1] 中

尽管我使用的是仍然可以使用 cmp 参数的 Python 2.6,但我正在尝试使这个 Python 3 安全。

那么,这可以使用一些 lambda 魔术和 key 参数来完成吗?

-== 更新 ==-

谢谢伊莱和温斯顿!我真的不认为使用 key 会起作用,或者如果可以的话,我怀疑这将是一个不理想的鞋拔解决方案。

因为我的问题是数据库表依赖项,所以我不得不对 Eli 的代码做一个小的补充,以从其依赖项列表中删除一个项目(在一个设计良好的数据库中,这不会发生,但谁生活在那个神奇的完美世界里?)

我的解决方案:

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted
4

4 回答 4

17

您想要的称为拓扑排序。虽然可以使用 builtin 来实现sort(),但比较笨拙,最好直接在 python 中实现拓扑排序。

为什么会很尴尬?如果您在 wiki 页面上研究这两种算法,它们都依赖于一组正在运行的“标记节点”,这是一个很难扭曲成表格的概念,sort()可以使用,因为key=xxx(甚至cmp=xxx)最适用于无状态比较函数,特别是因为timsort 不保证检查元素的顺序。我(非常)确定任何使用的解决方案最终都会sort()为每次调用 key/cmp 函数冗余计算一些信息,以便绕过无国籍问题。

以下是我一直在使用的 alg(对一些 javascript 库依赖项进行排序):

编辑:根据 Winston Ewert 的解决方案对其进行了极大的修改

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted

旁注:可以将一个cmp()函数插入到key=xxx中,如此 python 错误跟踪器消息中所述。

于 2012-07-19T15:37:30.957 回答
6

我做了这样的拓扑排序:

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我觉得它比 Eli 的版本简单一点,我不知道效率。

于 2012-07-19T16:00:27.203 回答
5

看着糟糕的格式和这种奇怪的Set类型......(我将它们保存为元组并正确分隔列表项......)......并使用networkx库使事情变得方便......

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)
于 2012-07-19T09:08:38.753 回答
0

这是 Winston 的建议,带有一个文档字符串和一个微小的调整,dependencies.issubset(provided)provided.issuperset(dependencies). 该更改允许您将dependencies每个输入对中set

我的用例涉及dict其键是项目字符串,每个键的值是list该键所依赖的项目名称中的一个。一旦我确定dict是非空的,我可以将其传递iteritems()给修改后的算法。

再次感谢温斯顿。

def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items
于 2016-01-26T15:56:26.213 回答