我在Colab 运行时运行它......但即使我等待超过 5 个小时,epoch 也不会上升。有什么问题要查吗??
!nvidia-smi
> NVIDIA-SMI 455.32.00
Driver Version: 418.67
CUDA Version: 10.1
Tesla V100-SXM2...
24W / 300W |0MiB / **16130MiB** | 0% Default |
%cat /proc/meminfo | grep MemTotal
> MemTotal: **26751732 kB**
%cat /proc/sys/vm/overcommit_memory
> 1
%tensorflow_version 1.x
> **TensorFlow 1.x selected.**
from google.colab import drive
drive.mount('/content/drive')
> Mounted at /content/drive
%cd '/content/drive/My Drive/colab-sg2-ada/stylegan2-ada'
> /content/drive/My Drive/colab-sg2-ada/stylegan2-ada
zip_path = '/content/drive/My\ Drive/glitchv2.zip'
!unzip {zip_path} -d /content/
> unzip~~~~~
images are **256x256** pixel 1200 images
dataset_path = '/content/dog'
dataset_name = 'glitchv2'
!python dataset_tool.py create_from_images ./datasets/{dataset_name} {dataset_path}
> Loading images from "/content/dog"
Creating dataset "./datasets/glitchv2"
Added 1200 images.
snapshot_count = 4
augs = 'bg'
!python train.py --outdir ./results --cfg=11gb-gpu --snap={snapshot_count} --data=./datasets/{dataset_name}
>
tcmalloc: large alloc 4294967296 bytes == 0x885c000 @ ~~~
tcmalloc: large alloc 4294967296 bytes == 0x885c000 @ ~~~
tcmalloc: large alloc 4294967296 bytes == 0x885c000 @ ~~~
Training options:{
"G_args": {
"func_name": "training.networks.G_main",
"fmap_base": 16384,
"fmap_max": 512,
"mapping_layers": 8,
"num_fp16_res": 4,
"conv_clamp": 256
},
"D_args": {
"func_name": "training.networks.D_main",
"mbstd_group_size": 4,
"fmap_base": 16384,
"fmap_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
},
"G_opt_args": {
"beta1": 0.0,
"beta2": 0.99,
"learning_rate": 0.002
},
"D_opt_args": {
"beta1": 0.0,
"beta2": 0.99,
"learning_rate": 0.002
},
"loss_args": {
"func_name": "training.loss.stylegan2",
"r1_gamma": 10
},
"augment_args": {
"class_name": "training.augment.AdaptiveAugment",
"tune_heuristic": "rt",
"tune_target": 0.6,
"apply_func": "training.augment.augment_pipeline",
"apply_args": {
"xflip": 1,
"rotate90": 1,
"xint": 1,
"scale": 1,
"rotate": 1,
"aniso": 1,
"xfrac": 1,
"brightness": 1,
"contrast": 1,
"lumaflip": 1,
"hue": 1,
"saturation": 1
}
},
"num_gpus": 1,
"image_snapshot_ticks": 4,
"network_snapshot_ticks": 4,
"train_dataset_args": {
"path": "./datasets/glitchv2",
"max_label_size": 0,
"use_raw": false,
"resolution": 256,
"mirror_augment": false,
"mirror_augment_v": false
},
"metric_arg_list": [
{
"name": "fid50k_full",
"class_name": "metrics.frechet_inception_distance.FID",
"max_reals": null,
"num_fakes": 50000,
"minibatch_per_gpu": 8,
"force_dataset_args": {
"shuffle": false,
"max_images": null,
"repeat": false,
"mirror_augment": false
}
}
],
"metric_dataset_args": {
"path": "./datasets/glitchv2",
"max_label_size": 0,
"use_raw": false,
"resolution": 256,
"mirror_augment": false,
"mirror_augment_v": false
},
"total_kimg": 25000,
"minibatch_size": 4,
"minibatch_gpu": 4,
"G_smoothing_kimg": 10,
"G_smoothing_rampup": null,
"run_dir": "./results/00001-glitchv2-11gb-gpu"
}
Output directory: ./results/00001-glitchv2-11gb-gpu
Training data: ./datasets/glitchv2
Training length: 25000 kimg
Resolution: 256
Number of GPUs: 1
Creating output directory...
Loading training set...
tcmalloc: large alloc 4294967296 bytes == 0x7f97addd0000 @ 0x7f9b908a6001 ~~~
tcmalloc: large alloc 4294967296 bytes == 0x7f96ad5d0000 @ 0x7f9b908a41e7 ~~~
tcmalloc: large alloc 4294967296 bytes == 0x7f96ad5d0000 @ 0x7f9b908a41e7 ~~~
Image shape: [3, 256, 256]
Label shape: [0]
Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Compiling... Loading... Done.
G Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 14, 512) -
dlatent_avg - (512,) -
Truncation/Lerp - (?, 14, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256)
G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256)
G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128)
G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
--- --- --- ---
Total 30034338
D Params OutputShape WeightShape
--- --- --- ---
images_in - (?, 3, 256, 256) -
labels_in - (?, 0) -
256x256/FromRGB 512 (?, 128, 256, 256) (1, 1, 3, 128)
256x256/Conv0 147584 (?, 128, 256, 256) (3, 3, 128, 128)
256x256/Conv1_down 295168 (?, 256, 128, 128) (3, 3, 128, 256)
256x256/Skip 32768 (?, 256, 128, 128) (1, 1, 128, 256)
128x128/Conv0 590080 (?, 256, 128, 128) (3, 3, 256, 256)
128x128/Conv1_down 1180160 (?, 512, 64, 64) (3, 3, 256, 512)
128x128/Skip 131072 (?, 512, 64, 64) (1, 1, 256, 512)
64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512)
64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512)
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512)
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512)
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512)
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
--- --- --- ---
Total 28864129
Exporting sample images...
Replicating networks across 1 GPUs...
Initializing augmentations...
Setting up optimizers...
Constructing training graph...
Finalizing training ops...
Initializing metrics...
Training for 25000 kimg...
tick 0 kimg 0.0 time 1m 56s sec/tick 15.3 sec/kimg 954.12 maintenance 100.9 gpumem 5.7 augment 0.000
Evaluating metrics...
Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2ada/pretrained/metrics/inception_v3_features.pkl ... done
Calculating real image statistics for fid50k_full...
tcmalloc: large alloc 4294967296 bytes == 0x7f93471cc000 @ 0x7f9b908a6001 ~~~
tcmalloc: large alloc 4294967296 bytes == 0x7f92471cc000 @ 0x7f9b908a41e7 ~~~
tcmalloc: large alloc 4294967296 bytes == 0x7f92471cc000 @ 0x7f9b908a41e7 ~~~
结果文件夹中只有第一个结果 (000000),5 小时后没有任何反应。但运行时继续工作。