目前,我的 hydra 配置组织如下:
configs/
├── config.yaml
├── data
│ ├── IMDB.yaml
│ └── REUT.yaml
└── model
├── BERT.yaml
├── GPT.yaml
└── loss
├── CrossEntropyLoss.yaml
└── TripletMarginLoss.yaml
配置.yaml:
defaults:
- model: BERT
- data: IMDB
tasks: [ "fit", "eval" ]
数据集 (IMDB.yaml
和REUT.yaml
) 设置格式为:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds: [0,1,2,3,4]
max_length: 256
num_classes: 10
模型 (BERT.yaml
和GPT.yaml
) 设置格式为:
defaults:
- loss: TripletMarginLoss
name: BERT
architecture: bert-base-uncased
lr: 5e-5
tokenizer:
architecture: ${model.architecture}
最后,损失函数设置 (CrossEntropyLoss.yaml
和TripletMarginLoss.yam
) 采用以下结构:
_target_: source.loss.TripletMarginLoss.TripletMarginLoss
params:
name: TripletMarginLoss
margin: 1.0
eps: 1e-6
reduction: mean
运行以下入口点:
@hydra.main(config_path="configs/", config_name="config.yaml")
def my_app(params):
OmegaConf.resolve(params)
print(
f"Params:\n"
f"{OmegaConf.to_yaml(params)}\n")
if __name__ == '__main__':
my_app()
# python main.py
生成正确的配置组合:
tasks:
- fit
- eval
model:
loss:
_target_: source.loss.TripletMarginLoss.TripletMarginLoss
params:
name: TripletMarginLoss
margin: 1.0
eps: 1.0e-06
reduction: mean
name: BERT
architecture: bert-base-uncased
lr: 5.0e-05
tokenizer:
architecture: bert-base-uncased
data:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds:
- 0
- 1
- 2
- 3
- 4
max_length: 256
num_classes: 10
但是,覆盖损失函数会生成错误的配置:
python main.py model.loss=CrossEntropyLoss
tasks:
- fit
- eval
model:
loss: CrossEntropyLoss
name: BERT
architecture: bert-base-uncased
lr: 5.0e-05
tokenizer:
architecture: bert-base-uncased
data:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds:
- 0
- 1
- 2
- 3
- 4
max_length: 256
num_classes: 10
那么,如何正确生成多层次的构图呢?