python - python中的高效张量收缩

Question

我有一个L张量（ndarray对象）列表，每个都有几个索引。我需要根据连接图收缩这些指数。

连接被编码在一个元组列表中，其形式((m,i),(n,j))表示“将张量的第i个索引与张量L[m]的第j个索引收缩L[n]。

如何处理非平凡的连接图？第一个问题是，一旦我收缩了一对索引，结果就是一个不属于 list 的新张量L。但即使我解决了这个问题（例如，通过为所有张量的所有索引提供唯一标识符），也存在一个问题，即可以选择任何顺序来执行收缩，并且某些选择会在中间计算中产生不必要的巨大野兽（即使最终结果很小）。建议？

score 5 · Accepted Answer

除了内存方面的考虑，我相信你可以在一次调用中完成收缩einsum，尽管你需要一些预处理。我不完全确定您所说的“当我收缩一对索引时，结果是一个不属于列表的新张量L”是什么意思，但我认为一步完成收缩将完全解决这个问题。

我建议使用以下替代的数字索引语法einsum：

einsum(op0, sublist0, op1, sublist1, ..., [sublistout])

所以你需要做的是将索引编码为整数序列。首先，您首先需要设置一系列唯一索引，并保留另一个副本用作sublistout. 然后，遍历您的连接图，您需要在必要时将收缩索引设置为相同的索引，同时从sublistout.

import numpy as np

def contract_all(tensors,conns):
    '''
    Contract the tensors inside the list tensors
    according to the connectivities in conns

    Example input:
    tensors = [np.random.rand(2,3),np.random.rand(3,4,5),np.random.rand(3,4)]
    conns = [((0,1),(2,0)), ((1,1),(2,1))]
    returned shape in this case is (2,3,5)
    '''

    ndims = [t.ndim for t in tensors]
    totdims = sum(ndims)
    dims0 = np.arange(totdims)
    # keep track of sublistout throughout
    sublistout = set(dims0.tolist())
    # cut up the index array according to tensors
    # (throw away empty list at the end)
    inds = np.split(dims0,np.cumsum(ndims))[:-1]
    # we also need to convert to a list, otherwise einsum chokes
    inds = [ind.tolist() for ind in inds]

    # if there were no contractions, we'd call
    # np.einsum(*zip(tensors,inds),sublistout)

    # instead we need to loop over the connectivity graph
    # and manipulate the indices
    for (m,i),(n,j) in conns:
        # tensors[m][i] contracted with tensors[n][j]

        # remove the old indices from sublistout which is a set
        sublistout -= {inds[m][i],inds[n][j]}

        # contract the indices
        inds[n][j] = inds[m][i]

    # zip and flatten the tensors and indices
    args = [subarg for arg in zip(tensors,inds) for subarg in arg]

    # assuming there are no multiple contractions, we're done here
    return np.einsum(*args,sublistout)

一个简单的例子：

>>> tensors = [np.random.rand(2,3), np.random.rand(4,3)]
>>> conns = [((0,1),(1,1))]
>>> contract_all(tensors,conns)
array([[ 1.51970003,  1.06482209,  1.61478989,  1.86329518],
       [ 1.16334367,  0.60125945,  1.00275992,  1.43578448]])
>>> np.einsum('ij,kj',tensors[0],tensors[1])
array([[ 1.51970003,  1.06482209,  1.61478989,  1.86329518],
       [ 1.16334367,  0.60125945,  1.00275992,  1.43578448]])

如果有多个收缩，循环中的逻辑会变得有点复杂，因为我们需要处理所有的重复。然而，逻辑是相同的。再者，上面显然缺少了确保相应指标能够被收缩的检查。

事后看来，我意识到sublistout不必指定默认值einsum，无论如何都使用该顺序。我决定将这个变量留在代码中，因为如果我们想要一个重要的输出索引顺序，我们必须适当地处理这个变量，它可能会派上用场。

至于收缩顺序的优化，您可以在np.einsum1.12 版中进行内部优化（正如 @hpaulj 在现已删除的评论中所指出的那样）。这个版本引入了optimize可选的关键字参数np.einsum，允许选择一个以内存为代价减少计算时间的收缩顺序。传递'greedy'or'optimal'作为optimize关键字将使 numpy 以尺寸大小的大致递减顺序选择收缩顺序。

关键字可用的选项optimize来自显然未记录的（就在线文档而言；help()幸运的是）功能np.einsum_path：

einsum_path(subscripts, *operands, optimize='greedy')

Evaluates the lowest cost contraction order for an einsum expression by
considering the creation of intermediate arrays.

的输出收缩路径np.einsum_path也可以用作optimize参数的输入np.einsum。在您的问题中，您担心使用了太多内存，所以我怀疑默认情况下没有优化（可能会更长的运行时间和更小的内存占用）。

score 1 · Accepted Answer

可能有帮助：查看https://arxiv.org/abs/1402.0939，它描述了在单个函数中收缩所谓的张量网络问题的有效框架ncon(...)。据我所知，它的实现可直接用于 Matlab（可以在链接中找到）和 Python3（https://github.com/mhauru/ncon）。

python - python中的高效张量收缩

2 回答 2

Related

Reference