python - 如何在训练中结合 2 个不同形状的 pytorch 张量？

Question

目前我的模型给出了 3 个输出张量。我希望他们两个更合作。我想使用 self.dropout1(hs) 和 self.dropout2(cls_hs) 的组合来通过 self.entity_out 线性层。提到了这个问题 2 张量的形状不同。

当前代码

class NLUModel(nn.Module):
def __init__(self, num_entity, num_intent, num_scenarios):
    super(NLUModel, self).__init__()
    self.num_entity = num_entity
    self.num_intent = num_intent
    self.num_scenario = num_scenarios

    self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL)

    self.dropout1 = nn.Dropout(0.3)
    self.dropout2 = nn.Dropout(0.3)
    self.dropout3 = nn.Dropout(0.3)

    self.entity_out = nn.Linear(768, self.num_entity)
    self.intent_out = nn.Linear(768, self.num_intent)
    self.scenario_out = nn.Linear(768, self.num_scenario)

def forward(self, ids, mask, token_type_ids):
    out = self.bert(input_ids=ids, attention_mask=mask,
                    token_type_ids=token_type_ids)

    hs, cls_hs = out['last_hidden_state'], out['pooler_output']

    entity_hs = self.dropout1(hs)
    intent_hs = self.dropout2(cls_hs)
    scenario_hs = self.dropout3(cls_hs)

    entity_hs = self.entity_out(entity_hs)
    intent_hs = self.intent_out(intent_hs)
    scenario_hs = self.scenario_out(scenario_hs)

    return entity_hs, intent_hs, scenario_hs

必需的

def forward(self, ids, mask, token_type_ids):
    out = self.bert(input_ids=ids, attention_mask=mask,
                    token_type_ids=token_type_ids)

    hs, cls_hs = out['last_hidden_state'], out['pooler_output']

    entity_hs = self.dropout1(hs)
    intent_hs = self.dropout2(cls_hs)
    scenario_hs = self.dropout3(cls_hs)

    entity_hs = self.entity_out(concat(entity_hs, intent_hs)) # Concatination
    intent_hs = self.intent_out(intent_hs)
    scenario_hs = self.scenario_out(scenario_hs)

    return entity_hs, intent_hs, scenario_hs

假设我成功连接了......反向传播会起作用吗？

score 0 · Accepted Answer

entity_hs (last_hidden_state) 的形状是 [batch_size, sequence_length, hidden_size]，intent_hs (pooler_output) 的形状只是 [batch_size, hidden_size] 将它们放在一起可能没有意义。这取决于你想做什么。

如果出于某种原因，您想获得输出 [batch_size, sequence_length, channels]，您可以平铺 intent_hs 张量：

intent_hs = torch.tile(intent_hs[:, None, :], (1, sequence_lenght, 1))
... = torch.cat([entity_hs, intent_hs], dim=2)

如果你想得到 [batch_size, channels]，你可以减少 entity_hs 张量，例如通过平均：

entity_hs = torch.mean(entity_hs, dim=1) 
... = torch.cat([entity_hs, intent_hs], dim=1)

是的，反向传播将通过连接（和其余部分）传播梯度。

python - 如何在训练中结合 2 个不同形状的 pytorch 张量？

1 回答 1

Related

Reference