python - 一个更快的嵌套元组列表和返回

Question

我正在尝试对未知深度和形状的嵌套序列执行元组到列表和列表到元组的转换。这些电话被打了几十万次，这就是为什么我试图尽可能快地挤出速度。

任何帮助深表感谢。

这是我到目前为止所拥有的...

def listify(self, seq, was, toBe):
  temp = []
  a = temp.append
  for g in seq:
    if type(g) == was:
      a(self.listify(g, was, toBe))
    else:
      a(g)
  return toBe(temp)

对 tuple 列表的调用如下所示：

self.listify((...), tuple, list)

编辑：是的，我完全错过了枚举（来自旧实现）并忘记输入其他部分。

谢谢你们俩的帮助。我可能会选择协程。

score 6 · Accepted Answer

我最近一直在安静地使用协程。这样做的好处是可以减少方法调用的开销。将新值发送到协程中比调用函数更快。虽然你不能创建递归协程，但它会抛出一个ValueError: generator already executing但你可以创建一个协程工作者池——你需要一个工作者来处理树的每一层。我已经制作了一些有效的测试代码，但还没有查看时间问题。

def coroutine(func):
    """ A helper function decorator from Beazley"""
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

@coroutine
def cotuple2list():
    """This does the work"""
    result = None
    while True:
        (tup, co_pool) = (yield result)
        result = list(tup)
        # I don't like using append. So I am changing the data in place.
        for (i,x) in enumerate(result):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,tuple):
                result[i] = co_pool[0].send((x, co_pool[1:]))


@coroutine
def colist2tuple():
    """This does the work"""
    result = None
    while True:
        (lst, co_pool) = (yield result)
        # I don't like using append so I am changing the data in place...
        for (i,x) in enumerate(lst):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,list):
                lst[i] = co_pool[0].send((x, co_pool[1:]))
        result = tuple(lst)

来自 HYRY 帖子的纯 python 替代品：

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))
def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

制作一个协同程序池 - 这是一个池的黑客，但它有效：

# Make Coroutine Pools
colist2tuple_pool = [colist2tuple() for i in xrange(20) ]
cotuple2list_pool = [cotuple2list() for i in xrange(20) ]

现在做一些时间 - 比较：

def make_test(m, n):
    # Test data function taken from HYRY's post!
    return [[range(m), make_test(m, n-1)] for i in range(n)]
import timeit
t = make_test(20, 8)
%timeit list2tuple(t)
%timeit colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))

结果 - 注意第二行中“s”旁边的“u”:-)

1 loops, best of 3: 1.32 s per loop
1 loops, best of 3: 4.05 us per loop

真的似乎太快了难以置信。有人知道 timeit 是否适用于协程吗？这是老式的方法：

tic = time.time()
t1 = colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))
toc = time.time()
print toc - tic

结果：

0.000446081161499

较新版本的 Ipython 和 %timit 会发出警告：

最慢的运行时间比最快的运行时间长 9.04 倍。这可能
意味着中间结果被缓存了 1000000 个循环，最好是 3：每个循环 317 ns

经过进一步调查，python 生成器并不神奇，send 仍然是一个函数调用。我的基于生成器的方法似乎更快的原因是我在列表上进行了就地操作 - 这导致了更少的函数调用。

在最近的一次谈话中，我用很多额外的细节写了这一切。

希望这对希望玩发电机的人有所帮助。

score 3 · Accepted Answer

分别定义两个函数：

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))

def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

一些测试：

t = [1, 2, [3, 4], [5, [7, 8]], 9]
t2 = list2tuple(t)
t3 = tuple2list(t2)
print t2
print t3

结果：

(1, 2, (3, 4), (5, (7, 8)), 9)
[1, 2, [3, 4], [5, [7, 8]], 9]

编辑：对于快速版本：

def list2tuple2(a, tuple=tuple, type=type, list=list):
    return tuple([list2tuple2(x) if type(x)==list else x for x in a])

def tuple2list2(a, tuple=tuple, type=type):
    return [tuple2list2(x) if type(x)==tuple else x for x in a]

为了比较，我还包括 cython 版本：

%%cython

def list2tuple3(a):
    return tuple([list2tuple3(x) if type(x)==list else x for x in a])

def tuple2list3(a):
    return [tuple2list3(x) if type(x)==tuple else x for x in a]

创建一些嵌套列表：

def make_test(m, n):
    return [[range(m), make_test(m, n-1)] for i in range(n)]

t = make_test(20, 8)
t2 = list2tuple2(t)

然后比较速度：

%timeit listify(t, list, tuple)
%timeit listify(t2, tuple, list)
%timeit list2tuple(t)
%timeit tuple2list(t2)
%timeit list2tuple2(t)
%timeit tuple2list2(t2)
%timeit list2tuple3(t)
%timeit tuple2list3(t2)

结果是：

listify
1 loops, best of 3: 828 ms per loop
1 loops, best of 3: 912 ms per loop

list2tuple generator expression version
1 loops, best of 3: 1.49 s per loop
1 loops, best of 3: 1.67 s per loop

list2tuple2 list comprehension with local cache
1 loops, best of 3: 623 ms per loop
1 loops, best of 3: 566 ms per loop

list2tuple3 cython
1 loops, best of 3: 212 ms per loop
10 loops, best of 3: 232 ms per loop

score 0 · Accepted Answer

由于上面的答案不处理字典值中的元组或列表，我发布了我自己的代码：

def tuple2list(data):
    if isinstance(data, dict):
        return {
            key: tuple2list(value)
            for key, value in data.items()
        }
    elif isinstance(data, (list, tuple)):
        return [
            tuple2list(item)
            for item in data
        ]
    return data

def list2tuple(data):
    if isinstance(data, dict):
        return {
            key: list2tuple(value)
            for key, value in data.items()
        }
    elif isinstance(data, (list, tuple)):
        return tuple(
            list2tuple(item)
            for item in data
        )
    return data

python - 一个更快的嵌套元组列表和返回

3 回答 3

Related

Reference