tensorflow - Colab 资源和 Self-Attention（分配张量时的 OOM）

Question

我正在尝试使用 Keras 在 google Colab 上实现自我注意 GAN。当我测试我的注意力层时，我遇到了 OOM 错误。那么，我是在矩阵乘法上做错了什么，还是在更高分辨率（> 64 x 64）下对于 colab GPU 来说这只是一个过于昂贵的操作？

def hw_flatten(x):
   # Input shape x: [BATCH, HEIGHT, WIDTH, CHANNELS]
   # flat the feature volume across the width and height dimensions 

   x = Reshape((x.shape[1]*x.shape[2], x.shape[3]))(x) #in the Reshape layer batch is implicit

   return x # return [BATCH, W*H, CHANNELS]



def matmul(couple_t):
  tensor_1 = couple_t[0]
  tensor_2 = couple_t[1]
  transponse = couple_t[2] #boolean 

  return tf.matmul(tensor_1, tensor_2, transpose_b=transponse)



class SelfAttention(Layer):

  def _init_(self, ch, **kwargs):
    super(SelfAttention, self).__init__(**kwargs)
    self.ch = ch

  
  def attentionMap(self, feature_map):

    f = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
    g = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
    h = Conv2D(filters=feature_map.shape[3], kernel_size=(1,1), strides=1, padding='same')(feature_map)   # [bs, h, w, c']

    s = Lambda(matmul)([hw_flatten(g), hw_flatten(f), True]) # [bs, N, N]
    beta = Activation("softmax")(s)

    o = Lambda(matmul)([beta, hw_flatten(h), False]) # [bs, N, C]


    gamma = self.add_weight(name='gamma', shape=[1], initializer='zeros', trainable=True)

    o = Reshape((feature_map.shape[1:]))(o) # [bs, h, w, C]

    x = gamma * o + feature_map

    print(x.shape)

    return x

这是测试：

tensor = np.random.normal(0, 1, size=(32, 64, 64, 512)).astype('float64')
attention_o = SelfAttention(64)
a = attention_o.attentionMap(tensor)

这是错误：

OOM when allocating tensor with shape[32,4096,4096] and type double

非常感谢您的关注：D

score 1 · Accepted Answer

您的 32x4096x4096 大小的张量有 536870912 个条目！这乘以双精度 (8) 中的字节数并转换为 Gb 是 4294！那超过 4Tb，绝对不适合 GPU。在应用自我注意之前，您可能希望添加一些最大池化层以降低数据的维度。

tensorflow - Colab 资源和 Self-Attention（分配张量时的 OOM）

1 回答 1

Related

Reference