我正在用变形金刚训练我的 NER 模型,我不确定为什么训练会在某个时候停止,或者为什么它甚至会使用这么多批次。这是我的配置文件的样子(相关部分):
[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 2
max_steps = 0
eval_frequency = 200
frozen_components = []
before_to_disk = null
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.00005
这是训练日志:
============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 5e-05
E # LOSS TRANS... LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------- -------- ------ ------ ------ ------
0 0 398.75 40.97 2.84 3.36 2.46 0.03
0 200 906.30 1861.38 94.51 94.00 95.03 0.95
0 400 230.06 1028.51 98.10 97.32 98.89 0.98
0 600 90.22 1013.38 98.99 98.40 99.58 0.99
0 800 80.64 1131.73 99.02 98.25 99.81 0.99
0 1000 98.50 1260.47 99.50 99.16 99.85 1.00
0 1200 73.32 1414.91 99.49 99.25 99.73 0.99
0 1400 84.94 1529.75 99.70 99.56 99.85 1.00
0 1600 55.61 1697.55 99.75 99.63 99.87 1.00
0 1800 80.41 1936.64 99.75 99.63 99.87 1.00
0 2000 115.39 2125.54 99.78 99.69 99.87 1.00
0 2200 63.06 2395.48 99.80 99.75 99.85 1.00
0 2400 104.14 2574.36 99.87 99.79 99.96 1.00
0 2600 86.07 2308.35 99.88 99.79 99.97 1.00
0 2800 81.05 1853.15 99.90 99.87 99.93 1.00
0 3000 52.67 1462.61 99.96 99.93 99.99 1.00
0 3200 57.99 1154.62 99.94 99.91 99.97 1.00
0 3400 110.74 847.50 99.90 99.85 99.96 1.00
0 3600 90.49 621.99 99.90 99.91 99.90 1.00
0 3800 51.03 378.93 99.87 99.78 99.97 1.00
0 4000 93.40 274.80 99.95 99.93 99.97 1.00
0 4200 138.98 203.28 99.91 99.87 99.96 1.00
0 4400 106.16 127.60 99.70 99.75 99.64 1.00
0 4600 70.28 87.25 99.95 99.94 99.96 1.00
✔ Saved pipeline to output directory
training/model-last
我试图训练我的模型 2 个 epoch ( max_epochs=2
),我的训练文件有大约 123591 个示例,而开发文件有 2522 个示例。
我的问题是:
由于我的最小批量大小为 100,我希望我的训练在第 2400 个评估批次之前结束,对吗?因为评估的第 2400 批意味着我至少有 2400*100 = 240000,实际上它甚至会更多,因为我的批大小正在增加。那么为什么会一路走到#4600呢?
训练自动结束,但 E 仍然读取第 0 个 epoch。这是为什么?
编辑:继续我的第二个要点,我很想知道为什么训练一直到 4600 个批次,因为至少 4600 个批次意味着 4600*100 = 460000 个示例,我给出了 123591 个训练示例,所以我显然远远超过了第一个时代,但 E 仍然读为 0。