1

错误消息表明[torch.cuda.FloatTensor [256, 1, 4, 4]] is at version 2; expected version 1 instead, 并且执行中断d_loss.backward()- 即,对我的鉴别器的反向调用。

更新:好的,我将其追踪到在我的鉴别optimizer.step()器上运行之前发生的生成器。.backward()

更新 2:所以一旦我让模型在 PyTorch 1.5 上运行(通过将 G 的优化器移动到d_loss.backward()调用之后,如上所述),我注意到在训练期间损失突然高得多。我让模型运行了几个 epoch,图像基本上是噪声。所以,出于好奇,我切换回我的 PyTorch 1.4 环境并运行原始版本几个 epoch,图像又好了。这是我正在训练的 ClusterGAN——所以不是标准程序——我想知道为什么这种变化对输出如此有害。另外,如何让模型在 PyTorch 1.5 中运行而不降低性能?大概我必须将优化器更新保持在原来的位置(就在 之后),但是当我们稍后在代码中ge_loss.backward(retain_graph=True)点击时,以某种方式避免了 PyTorch 1.5 报告的错误。d_loss.backward()clone()东西,但我不清楚是什么......?

[...]
# main training block
for epoch in range(n_epochs):
        for i, (imgs, itruth_label) in enumerate(dataloader):
            iter_count += 1
            # Ensure generator/encoder are trainable
            generator.train()
            encoder.train()
            # Zero gradients for models
            generator.zero_grad()
            encoder.zero_grad()
            discriminator.zero_grad()

            # Configure input
            real_imgs = Variable(imgs.type(Tensor))

            # ---------------------------
            #  Train Generator + Encoder
            # ---------------------------

            optimizer_GE.zero_grad()

            # Sample random latent variables
            zn, zc, zc_idx = sample_z(shape=imgs.shape[0],
                                      latent_dim=latent_dim,
                                      n_c=n_c)

            # Generate a batch of images
            gen_imgs = generator(zn, zc)

            # Discriminator output from real and generated samples
            D_gen = discriminator(gen_imgs)
            D_real = discriminator(real_imgs)

            # Step for Generator & Encoder, n_skip_iter times less than for discriminator
            did_update = False
            if (i % n_skip_iter == 0):
                # Encode the generated images
                enc_gen_zn, enc_gen_zc, enc_gen_zc_logits = encoder(gen_imgs)

                # Calculate losses for z_n, z_c
                zn_loss = mse_loss(enc_gen_zn, zn)
                zc_loss = xe_loss(enc_gen_zc_logits, zc_idx)

                # additional top-k step (from Sinha et al, 2020)
                if top_k <= D_gen.size()[0]:
                    top_k_gen = torch.topk(D_gen, top_k, 0)
                else:
                    top_k_gen = torch.topk(D_gen, D_gen.size()[0], 0) 

                # Check requested metric
                if wass_metric:
                    # Wasserstein GAN loss

                    ge_loss = torch.mean(top_k_gen[0]) + betan * zn_loss + betac * zc_loss
                else:
                    # Vanilla GAN loss
                    valid = Variable(Tensor(gen_imgs.size(0), 1).fill_(1.0), requires_grad=False)
                    v_loss = bce_loss(D_gen, valid)
                    ge_loss = v_loss + betan * zn_loss + betac * zc_loss

                ge_loss.backward(retain_graph=True)
                # ---- ORIGINAL OPTIMIZER UPDATE ---- #
                optimizer_GE.step()
                scheduler.step(epoch + i / iters)
                did_update = True

            # ---------------------
            #  Train Discriminator
            # ---------------------

            optimizer_D.zero_grad()

            # Measure discriminator's ability to classify real from generated samples
            if wass_metric:
                # Gradient penalty term
                grad_penalty = calc_gradient_penalty(discriminator, real_imgs, gen_imgs)

                # Wasserstein GAN loss w/gradient penalty
                d_loss = torch.mean(D_real) - torch.mean(D_gen) + grad_penalty

            else:
                # Vanilla GAN loss
                fake = Variable(Tensor(gen_imgs.size(0), 1).fill_(0.0), requires_grad=False)
                real_loss = bce_loss(D_real, valid)
                fake_loss = bce_loss(D_gen, fake)
                d_loss = (real_loss + fake_loss) / 2

            d_loss.backward()
            # --- REVISED OPTIMIZER UPDATE FOR PyTorch 1.5 ------ #
            # if did_update:
            #     optimizer_GE.step()
            optimizer_D.step()
            # scheduler.step(epoch + i / iters)
[...]
4

1 回答 1

0

如果我理解正确,错误发生在您第二次调用.backward(). 该问题是由调用和两次引起.backward()的。我不知道你在用这个模型做什么,但我想你在训练生成器时不必后退和更新鉴别器的参数,对吧?D_genD_real

所以,试试这个:

1.requires_grad在舞台D.parameters()False设置Train Generator + Encoder

2.requires_grad在舞台D.parameters()True设置Train Discriminator

于 2020-06-09T02:07:23.280 回答