python - PyTorch - contiguous() 做什么？

Question

我在 github (link)上浏览了 LSTM 语言模型的这个例子。它的一般作用对我来说很清楚。但是我仍然在努力理解调用contiguous()的作用，这在代码中出现了好几次。

例如，在代码输入的第 74/75 行，创建了 LSTM 的目标序列。数据（存储在中ids）是二维的，其中第一维是批量大小。

for i in range(0, ids.size(1) - seq_length, seq_length):
    # Get batch inputs and targets
    inputs = Variable(ids[:, i:i+seq_length])
    targets = Variable(ids[:, (i+1):(i+1)+seq_length].contiguous())

举个简单的例子，当使用批量大小 1 和seq_length10时inputs，targets看起来像这样：

inputs Variable containing:
0     1     2     3     4     5     6     7     8     9
[torch.LongTensor of size 1x10]

targets Variable containing:
1     2     3     4     5     6     7     8     9    10
[torch.LongTensor of size 1x10]

所以总的来说，我的问题是，做contiguous()什么以及为什么需要它？

此外，我不明白为什么要为目标序列而不是输入序列调用该方法，因为这两个变量都包含相同的数据。

怎么可能targets是不连续的，inputs但仍然是连续的？

编辑：

我试图省略 call contiguous()，但这会在计算损失时导致错误消息。

RuntimeError: invalid argument 1: input is not contiguous at .../src/torch/lib/TH/generic/THTensor.c:231

contiguous()所以显然在这个例子中调用是必要的。

score 276 · Accepted Answer

There are a few operations on Tensors in PyTorch that do not change the contents of a tensor, but change the way the data is organized. These operations include:

narrow(), view(), expand() and transpose()

For example: when you call transpose(), PyTorch doesn't generate a new tensor with a new layout, it just modifies meta information in the Tensor object so that the offset and stride describe the desired new shape. In this example, the transposed tensor and original tensor share the same memory:

x = torch.randn(3,2)
y = torch.transpose(x, 0, 1)
x[0, 0] = 42
print(y[0,0])
# prints 42

This is where the concept of contiguous comes in. In the example above, x is contiguous but y is not because its memory layout is different to that of a tensor of same shape made from scratch. Note that the word "contiguous" is a bit misleading because it's not that the content of the tensor is spread out around disconnected blocks of memory. Here bytes are still allocated in one block of memory but the order of the elements is different!

When you call contiguous(), it actually makes a copy of the tensor such that the order of its elements in memory is the same as if it had been created from scratch with the same data.

Normally you don't need to worry about this. You're generally safe to assume everything will work, and wait until you get a RuntimeError: input is not contiguous where PyTorch expects a contiguous tensor to add a call to contiguous().

score 39 · Accepted Answer

从pytorch 文档：

contiguous() → Tensor
返回一个包含与 self 张量相同数据的连续张量。如果自张量是连续的，则此函数返回自张量。

这里contiguous不仅意味着在内存中连续，而且在内存中的顺序与索引顺序相同：例如，进行转置不会更改内存中的数据，它只是将映射从索引更改为内存指针，如果你那么应用contiguous()它将更改内存中的数据，以便从索引到内存位置的映射是规范的。

score 18 · Accepted Answer

tensor.contiguous() 将创建张量的副本，副本中的元素将以连续的方式存储在内存中。当我们首先转置（）张量然后重塑（查看）它时，通常需要 contiguous() 函数。首先，让我们创建一个连续的张量：

aaa = torch.Tensor( [[1,2,3],[4,5,6]] )
print(aaa.stride())
print(aaa.is_contiguous())
#(3,1)
#True

stride() 返回 (3,1) 的意思是：当每一步（逐行）沿着第一个维度移动时，我们需要在内存中移动 3 步。沿第二维（逐列）移动时，我们需要在内存中移动 1 步。这表明张量中的元素是连续存储的。

现在我们尝试将来函数应用于张量：

bbb = aaa.transpose(0,1)
print(bbb.stride())
print(bbb.is_contiguous())

#(1, 3)
#False


ccc = aaa.narrow(1,1,2)   ## equivalent to matrix slicing aaa[:,1:3]
print(ccc.stride())
print(ccc.is_contiguous())

#(3, 1)
#False


ddd = aaa.repeat(2,1)   # The first dimension repeat once, the second dimension repeat twice
print(ddd.stride())
print(ddd.is_contiguous())

#(3, 1)
#True


## expand is different from repeat.
## if a tensor has a shape [d1,d2,1], it can only be expanded using "expand(d1,d2,d3)", which
## means the singleton dimension is repeated d3 times
eee = aaa.unsqueeze(2).expand(2,3,3)
print(eee.stride())
print(eee.is_contiguous())

#(3, 1, 0)
#False


fff = aaa.unsqueeze(2).repeat(1,1,8).view(2,-1,2)
print(fff.stride())
print(fff.is_contiguous())

#(24, 2, 1)
#True

好了，我们可以发现transpose()、narrow()和张量切片、expand()都会使生成的张量不连续。有趣的是，repeat() 和 view() 不会使其不连续。所以现在的问题是：如果我使用不连续的张量会发生什么？

答案是 view() 函数不能应用于不连续的张量。这可能是因为 view() 要求张量连续存储，以便它可以在内存中快速重塑。例如：

bbb.view(-1,3)

我们会得到错误：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-63-eec5319b0ac5> in <module>()
----> 1 bbb.view(-1,3)

RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/TH/generic/THTensor.cpp:203

要解决这个问题，只需将 contiguous() 添加到不连续的张量，以创建连续副本，然后应用 view()

bbb.contiguous().view(-1,3)
#tensor([[1., 4., 2.],
        [5., 3., 6.]])

score 11 · Accepted Answer

正如前面的答案 contigous() 分配连续的内存块一样，当我们将张量传递给 c 或 c++ 后端代码时，它会很有帮助，其中张量作为指针传递

score 7 · Accepted Answer

接受的答案太棒了，我试图欺骗transpose()功能效果。我创建了两个可以检查samestorage()和的函数contiguous。

def samestorage(x,y):
    if x.storage().data_ptr()==y.storage().data_ptr():
        print("same storage")
    else:
        print("different storage")
def contiguous(y):
    if True==y.is_contiguous():
        print("contiguous")
    else:
        print("non contiguous")

我检查并得到了这个结果作为一个表：

您可以查看下面的检查器代码，但让我们举一个张量不连续的例子。我们不能简单地调用view()那个张量，我们需要reshape()它或者我们也可以调用.contiguous().view().

x = torch.randn(3,2)
y = x.transpose(0, 1)
y.view(6) # RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
  
x = torch.randn(3,2)
y = x.transpose(0, 1)
y.reshape(6)

x = torch.randn(3,2)
y = x.transpose(0, 1)
y.contiguous().view(6)

还需要注意的是，有一些方法最终会创建连续和非连续张量。有些方法可以在同一个 storage上操作，有些方法flip()会在返回之前创建一个新的存储（读取：克隆张量）。

校验码：

import torch
x = torch.randn(3,2)
y = x.transpose(0, 1) # flips two axes
print("\ntranspose")
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\nnarrow")
x = torch.randn(3,2)
y = x.narrow(0, 1, 2) #dim, start, len  
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\npermute")
x = torch.randn(3,2)
y = x.permute(1, 0) # sets the axis order
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\nview")
x = torch.randn(3,2)
y=x.view(2,3)
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\nreshape")
x = torch.randn(3,2)
y = x.reshape(6,1)
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\nflip")
x = torch.randn(3,2)
y = x.flip(0)
print(x)
print(y)
contiguous(y)
samestorage(x,y)

print("\nexpand")
x = torch.randn(3,2)
y = x.expand(2,-1,-1)
print(x)
print(y)
contiguous(y)
samestorage(x,y)

score 6 · Accepted Answer

如果一维数组 [0, 1, 2, 3, 4] 的项目在内存中彼此相邻排列，则它是连续的，如下所示：

如果存储它的内存区域如下所示，则它不是连续的：

对于二维数组或更多数组，项目也必须彼此相邻，但顺序遵循不同的约定。让我们考虑下面的二维数组：

>>> t = torch.tensor([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])

如果这些行彼此相邻存储，则内存分配是C 连续的，如下所示：

这就是 Pytorch 认为的连续性。

>>> t.is_contiguous()
True

与数组关联的 stride 属性给出了跳过的字节数以获取每个维度中的下一个元素

>>> t.stride()
(4, 1)

我们需要跳过 4 个字节才能转到下一行，但只需一个字节即可转到同一行中的下一个元素。

正如在其他答案中所说，某些 Pytorch 操作不会更改内存分配，只会更改元数据。

例如转置方法。让我们转置张量：

内存分配没有改变：

但大步做到了：

>>> t.T.stride()
(1, 4)

我们需要跳过 1 个字节才能进入下一行，跳过 4 个字节才能进入同一行中的下一个元素。张量不再是 C 连续的（实际上是Fortran 连续的：每列彼此相邻存储）

>>> t.T.is_contiguous()
False

contiguous() 将重新排列内存分配，使张量是 C 连续的：

>>> t.T.contiguous().stride()
(3, 1)

score 3 · Accepted Answer

一个张量，其值从最右边的维度开始排列在存储中（即，沿二维张量的行移动）定义为contiguous. 连续张量很方便，因为我们可以有效地按顺序访问它们，而无需在存储中跳转（由于现代 CPU 上的内存访问方式，提高数据局部性可以提高性能）。这种优势当然取决于算法访问的方式。

PyTorch 中的一些张量操作仅适用于连续张量，例如view, [...]。在这种情况下，PyTorch 将抛出一个信息异常并要求我们显式调用 contiguous。值得注意的是，contiguous如果张量已经是连续的，调用将不会做任何事情（并且不会影响性能）。

使用 PyTorch 进行深度学习

请注意，这是比计算机科学中“连续”一词的一般用法（即连续和有序）更具体的含义。

例如给定一个张量：

[[1, 2]
 [3, 4]]

内存中的存储	火炬`contiguous`？	通常在内存空间中“连续”？
`1 2 3 4 0 0 0`	✅</td>	✅</td>
`1 3 2 4 0 0 0`	❌</td>	✅</td>
`1 0 2 0 3 0 4`	❌</td>	❌</td>

score 0 · Accepted Answer

据我了解，这是一个更概括的答案：

连续是用于表示张量的内存布局与其宣传的元数据或形状信息不一致的术语。

在我看来，连续这个词是一个令人困惑/误导的术语，因为在正常情况下，它意味着当内存没有散布在断开的块中时（即它的“连续/连接/连续”）。

某些操作可能出于某种原因需要此连续属性（很可能是 gpu 中的效率等）。

请注意，这.view是另一个可能导致此问题的操作。查看我通过简单地调用 contiguous 修复的以下代码（而不是导致它的典型转置问题，这是一个示例，它是当 RNN 对其输入不满意时引起的）：

        # normal lstm([loss, grad_prep, train_err]) = lstm(xn)
        n_learner_params = xn_lstm.size(1)
        (lstmh, lstmc) = hs[0] # previous hx from first (standard) lstm i.e. lstm_hx = (lstmh, lstmc) = hs[0]
        if lstmh.size(1) != xn_lstm.size(1): # only true when prev lstm_hx is equal to decoder/controllers hx
            # make sure that h, c from decoder/controller has the right size to go into the meta-optimizer
            expand_size = torch.Size([1,n_learner_params,self.lstm.hidden_size])
            lstmh, lstmc = lstmh.squeeze(0).expand(expand_size).contiguous(), lstmc.squeeze(0).expand(expand_size).contiguous()
        lstm_out, (lstmh, lstmc) = self.lstm(input=xn_lstm, hx=(lstmh, lstmc))

我曾经得到的错误：

RuntimeError: rnn: hx is not contiguous

来源/资源：

python - PyTorch - contiguous() 做什么？

8 回答 8

Related

Reference