我对示例 train_textcat.py 中使用的 minibatching 有疑问
主训练循环如下所示:
for i in range(n_iter):
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(train_data, size=compounding(4., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
losses=losses)
with textcat.model.use_params(optimizer.averages):
# evaluate on the dev data split off in load_data()
scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
我在想为什么所有批次的小批量都在一次迭代中消耗,而不是在主循环的每次迭代中消耗一批?以下代码应该解释我的意思。
# batch up the examples using spaCy's minibatch
batches = minibatch(train_data, size=compounding(4., 32., 1.001))
for i, texts, annotations in zip(range(n_iter),*batch):
losses = {}
nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)
with textcat.model.use_params(optimizer.averages):
# evaluate on the dev data split off in load_data()
scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
提前致谢!
你的环境
- spaCy 版本: 2.0.12
- 平台: Windows-10-10.0.14393-SP0
- Python版本: 3.6.5
- 型号: de