python - 损失函数中的所有变量都必须是 pytorch 中的 grads 张量吗？

Question

我有以下功能


def msfe(ys, ts):
    ys=ys.detach().numpy() #output from the network
    ts=ts.detach().numpy() #Target (true labels)
    pred_class = (ys>=0.5) 
    n_0 = sum(ts==0) #Number of true negatives
    n_1 = sum(ts==1) #Number of true positives
    FPE = sum((ts==0)[[bool(p) for p in (pred_class==1)]])/n_0 #False positive error
    FNE = sum((ts==1)[[bool(p) for p in (pred_class==0)]])/n_1 #False negative error
    loss= FPE**2+FNE**2

    loss=torch.tensor(loss,dtype=torch.float64,requires_grad=True)


    return loss

我想知道，Pytorch 中的 autograd 是否正常工作，因为ys并且ts没有grad标志。

所以我的问题是：在工作FPE,FNE,ys,ts,n_1,n_0之前，所有变量 () 都必须是张量optimizer.step()吗，还是只有最终函数 ( loss) 是

score 6 · Accepted Answer

您想要优化的所有变量都optimizer.step()需要有梯度。

在您的情况下，它将y由网络预测，所以您不detach应该（从图表）。

通常你不会改变你的targets，所以那些不需要渐变。不过，您不应该使用detach它们，默认情况下张量不需要梯度，也不会反向传播。

Loss如果成分（至少一种）具有梯度，则将具有梯度。

总的来说，您很少需要手动处理它。

顺便提一句。不要与 PyTorch 一起使用numpy，很少会这样做。您可以numpy在 PyTorch 的张量上对数组执行大部分操作。

顺便说一句。Variable不再有in这样的东西pytorch，只有需要梯度的张量和不需要的张量。

不可微性

1.1 现有代码的问题

实际上，您正在使用不可微分的函数（即>=和==）。这些只会在输出的情况下给你带来麻烦，因为那些需要梯度（你可以使用==和>=for targets）。

下面我附上了您的损失函数并在评论中概述了其中的问题：

# Gradient can't propagate if you detach and work in another framework
# Most Python constructs should be fine, detaching will ruin it though.
def msfe(outputs, targets):
    # outputs=outputs.detach().numpy() # Do not detach, no need to do that
    # targets=targets.detach().numpy() # No need for numpy either
    pred_class = outputs >= 0.5  # This one is non-differentiable
    # n_0 = sum(targets==0) # Do not use sum, there is pytorch function for that
    # n_1 = sum(targets==1)

    n_0 = torch.sum(targets == 0)  # Those are not differentiable, but...
    n_1 = torch.sum(targets == 1)  # It does not matter as those are targets

    # FPE = sum((targets==0)[[bool(p) for p in (pred_class==1)]])/n_0 # Do not use Python bools
    # FNE = sum((targets==1)[[bool(p) for p in (pred_class==0)]])/n_1 # Stay within PyTorch
    # Those two below are non-differentiable due to == sign as well
    FPE = torch.sum((targets == 0.0) * (pred_class == 1.0)).float() / n_0
    FNE = torch.sum((targets == 1.0) * (pred_class == 0.0)).float() / n_1
    # This is obviously fine
    loss = FPE ** 2 + FNE ** 2

    # Loss should be a tensor already, don't do things like that
    # Gradient will not be propagated, you will have a new tensor
    # Always returning gradient of `1` and that's all
    # loss = torch.tensor(loss, dtype=torch.float64, requires_grad=True)

    return loss

1.2 可能的解决方案

所以，你需要去掉 3 个不可微分的部分。原则上，您可以尝试使用网络的连续输出来近似它（前提是您sigmoid用作激活）。这是我的看法：

def msfe_approximation(outputs, targets):
    n_0 = torch.sum(targets == 0)  # Gradient does not flow through it, it's okay
    n_1 = torch.sum(targets == 1)  # Same as above
    FPE = torch.sum((targets == 0) * outputs).float() / n_0
    FNE = torch.sum((targets == 1) * (1 - outputs)).float() / n_1

    return FPE ** 2 + FNE ** 2

请注意，最小化FPE outputs将尝试在为零zero的索引上。targets同样对于FNE，如果目标是1，网络也会尝试输出1。

注意这个想法与BCELoss（Binary CrossEntropy）的相似之处。

最后，您可以运行它的示例，仅用于完整性检查：

if __name__ == "__main__":
    model = torch.nn.Sequential(
        torch.nn.Linear(30, 100),
        torch.nn.ReLU(),
        torch.nn.Linear(100, 200),
        torch.nn.ReLU(),
        torch.nn.Linear(200, 1),
        torch.nn.Sigmoid(),
    )
    optimizer = torch.optim.Adam(model.parameters())
    targets = torch.randint(high=2, size=(64, 1)) # random targets
    inputs = torch.rand(64, 30) # random data
    for _ in range(1000):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = msfe_approximation(outputs, targets)
        print(loss)
        loss.backward()
        optimizer.step()

    print(((model(inputs) >= 0.5) == targets).float().mean())

python - 损失函数中的所有变量都必须是 pytorch 中的 grads 张量吗？

1 回答 1

不可微性

1.1 现有代码的问题

1.2 可能的解决方案

Related

Reference