使用 spacy v3,我尝试使用 camemBert 训练分类器并遇到CUDA out of memory
问题。为了解决这个问题,我读到我应该减少批量大小,但我很困惑我应该改变哪个参数:
- [nlp] 批处理大小
- [components.transformer] max_batch_items
- [corpora.train 或 dev] max_length
- [trainning.batcher] 大小
- [trainning.batcher] 缓冲区
我试图了解每个参数之间的区别:
- [nlp] 批处理大小
管道和评估的默认批量大小。默认为 1000。
培训/评估过程中是否使用了这些功能?
在快速启动小部件 ( https://spacy.io/usage/training#quickstart ) 中,为什么值根据硬件不同?CPU 为 1000,GPU 为 128。
在训练过程中,如果这个值低,评估会变慢吗?
- [components.transformer] max_batch_items
填充批次的最大大小。默认为 4096。
根据警告消息:Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors
此处解释(https://github.com/explosion/spaCy/issues/6939),卡门培尔模型的指定最大序列长度为512。
参数 max_batch_item 是否重载到该值?我应该将值更改为 512 吗?
- [corpora.train 或 dev] max_length
在我的理解中,这个值应该等于或小于最大序列长度。在快速入门小部件中,此值设置为 500 用于训练集和 0 用于开发集。如果设置为0,是否会过载到transformer模型的最大序列长度?
- spacy.batch_by_padded.v1 的 [trainning.batcher] 大小
将序列批处理到的最大填充大小。也可以是引用时间表的块,例如复利。
如果我不使用复利,这个参数与 max_lentgh 有什么不同?
这是我的配置文件的一些部分
[nlp]
lang = "fr"
pipeline = ["transformer","textcat"]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 256
...
[components.transformer]
factory = "transformer"
# Maximum size of a padded batch. Defaults to 4096.
max_batch_items = 4096
...
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
# Limitations on training document length
max_length = 512
...
[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
# The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
size = 500
# The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training.
buffer = 128
get_length = null
...