python - 使用 gensim 加载 FastText 的法语预训练模型时出错

Question

我正在尝试使用 FastText 的法语预训练二进制模型（从官方FastText 的 github 页面下载）。我需要.bin模型而不是.vec词向量来近似拼写错误和词汇外的词。

但是，当我尝试加载所述模型时，使用：

from gensim.models import FastText
model = FastText.load_fasttext_format('french_bin_model_path')

我收到以下错误：

NotImplementedError: Supervised fastText models are not supported

令人惊讶的是，当我尝试加载英文二进制模型时它工作得很好。

我正在运行 python 3.6 和 gensim 3.5.0。

欢迎任何关于为什么它不适用于法国矢量的想法！

score 5 · Accepted Answer

我遇到了同样的问题，最终使用Facebook python 包装器进行 FastText 而不是 gensim 的实现。

import fastText 
model = fastText.load(path_to_french_bin)

然后你可以得到词汇外单词的词向量，如下所示：

oov_vector = model.get_word_vector(oov_word)

至于为什么 gensim 的load_fasttext_format作品适用于英国模型而不是法国模型，我不知道！

score 0 · Accepted Answer

我从未使用过 FastText，但问题可能出在文件的编码上。如果您使用的是 macOS，请尝试将其更改为 Utf-8；如果您使用的是 Windows，请尝试将其更改为 Latin-1。

2 回答 2