0

在 spacy 2 中,我使用它向具有向量空间 (spacy init) 的空 spacy 模型添加词汇:

nlp3=spacy.load('nl_core_news_sm') #standard model without vectors
spacy.load("spacyinitnlmodelwithvectorspace",vocab=nlp3.vocab)

在 spacy nightly 版本 3.0.0rc 中,vocab 参数不再位于 spacy.load 中。有没有人建议我如何将词汇添加到 spacy 模型中?

4

1 回答 1

0

这有效,从将向量从 fastText 导出到 spaCy 将 vecfile 添加到 spacy 模型。仅在小数据集上测试

未来导入 unicode_literals

导入 numpy 导入 spacy

def spacy_load_vec(spacy_model,vec_file,spacy_vec_model,print_words=False): """ spacy model zonder vectoren + vecfile wordt spacy model met vectorspace 将向量从 fastText 导出到 spaCy

Parameters
----------
spacy_model : TYPE
    spacy model zonder vectorspace.
vec_file : TYPE
    vecfile met fasttext of w2v getrainde vectoren.
spacy_vec_model : TYPE
    spacy model met vectorspace.
print_words : TYPE, optional
    woorden printen True/false. The default is False.

Returns
-------
None.

"""
nlp = spacy.load(spacy_model)
with open(vec_file, 'rb') as file_:
    header = file_.readline()
    nr_row, nr_dim = header.split()
    nlp.vocab.reset_vectors(width=int(nr_dim))
    count = 0
    for line in file_:
        count += 1
        line = line.rstrip().decode('utf8')
        pieces = line.rsplit(' ', int(nr_dim))
        word = pieces[0]
        if print_words:
            print("{} - {}".format(count, word)) 
        vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
        nlp.vocab.set_vector(word, vector)  # add the vectors to the vocab
nlp.to_disk(spacy_vec_model)
于 2020-12-07T14:56:59.823 回答