我正在做一个词嵌入项目。为此,我正在使用 Amazon SageMaker。Amazon SageMaker 中的 BlazingText 算法产生的结果比其他选项快。但我没有看到任何获得预测模型或权重的工具。输出仅包含我无法从中生成模型的向量文件。有什么方法可以让我得到带有矢量文件的模型吗?我需要这个来预测新单词。提前致谢。
3 回答
您可以通过使用 KeyedVectors api 上传 vector.txt/bin 文件来重现类似的结果,例如 most_similar。
这是一个例子:
from gensim.models import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format('vectors.txt', binary=False)
word_vectors = KeyedVectors.load_word2vec_format('vectors.bin', binary=True)
BlazingText models can generate vectors for new words if you enable the subword embeddings learning by setting the "subwords" parameter to True while training. Once the training job is complete, you will need to create a SageMaker endpoint and deploy the model. You can send POST requests to this endpoint for retrieving the word vectors, as demonstrated in the "Hosting / Inference" section of this notebook:
bt_endpoint = bt_model.deploy(initial_instance_count = 1,instance_type = 'ml.m4.xlarge') words = ["awesome", "awweeesome"] payload = {"instances" : words} response = bt_endpoint.predict(json.dumps(payload)) vecs = json.loads(response)
For more details about this feature, you can also refer to this blog post.