6

Whenever I export a fastai model and reload it, I get this error (or a very similar one) when I try and use the reloaded model to generate predictions on a new test set:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Minimal reprodudeable code example below, you just need to update your FILES_DIR variable to where the MNIST data gets deposited on your system:

from fastai import *
from fastai.vision import *

# download data for reproduceable example
untar_data(URLs.MNIST_SAMPLE)
FILES_DIR = '/home/mepstein/.fastai/data/mnist_sample'  # this is where command above deposits the MNIST data for me


# Create FastAI databunch for model training
tfms = get_transforms()
tr_val_databunch = ImageDataBunch.from_folder(path=FILES_DIR,  # location of downloaded data shown in log of prev command
                                train = 'train',
                                valid_pct = 0.2,
                                ds_tfms = tfms).normalize()

# Create Model
conv_learner = cnn_learner(tr_val_databunch, 
                           models.resnet34, 
                           metrics=[error_rate]).to_fp16()

# Train Model
conv_learner.fit_one_cycle(4)

# Export Model
conv_learner.export()  # saves model as 'export.pkl' in path associated with the learner

# Reload Model and use it for inference on new hold-out set
reloaded_model = load_learner(path = FILES_DIR,
                              test = ImageList.from_folder(path = f'{FILES_DIR}/valid'))

preds = reloaded_model.get_preds(ds_type=DatasetType.Test)

Output:

"RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same"

Stepping through the code statement by statement, everything works fine until the last line pred = ... which is where the torch error above pops up.

Relevant software versions:

Python 3.7.3 fastai 1.0.57
torch 1.2.0
torchvision 0.4.0

4

2 回答 2

3

所以这个问题的答案最终变得相对简单:

1)如我的评论中所述,混合精度模式(设置conv_learner to_fp16())的训练导致导出/重新加载模型的错误

2) 要在混合精度模式下进行训练(比常规训练更快)并启用模型的导出/重新加载而不会出错,只需在导出前将模型设置回默认精度即可。

...在代码中,只需更改上面的示例:

# Export Model
conv_learner.export()

至:

# Export Model (after converting back to default precision for safe export/reload
conv_learner = conv_learner.to_fp32()
conv_learner.export()

...现在上面的完整(可重现)代码示例运行没有错误,包括模型重新加载后的预测。

于 2019-09-10T18:49:47.103 回答
2

.to_fp16如果你有,你的模型是半精度的,如果你model.half()在 PyTorch 中是一样的。

实际上如果你跟踪代码.to_fp16会调用model.half() 但是有一个问题。如果您将批处理规范层也转换为半精度,您可能会遇到收敛问题。

这就是为什么你通常会在 PyTorch 中这样做:

model.half()  # convert to half precision
for layer in model.modules():
  if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):      
    layer.float()

这会将任何层转换为除批处理规范之外的半精度。

请注意,来自PyTorch 论坛的代码也可以,但仅适用于nn.BatchNorm2d.

然后确保您的输入是半精度使用to()这样的:

import torch
t = torch.tensor(10.)
print(t)
print(t.dtype)
t=t.to(dtype=torch.float16)
print(t)
print(t.dtype)
# tensor(10.)
# torch.float32
# tensor(10., dtype=torch.float16)
# torch.float16
于 2019-08-23T09:16:51.040 回答