amazon-web-services - Amazon SageMaker BlazingText

Question

我正在做一个词嵌入项目。为此，我正在使用 Amazon SageMaker。Amazon SageMaker 中的 BlazingText 算法产生的结果比其他选项快。但我没有看到任何获得预测模型或权重的工具。输出仅包含我无法从中生成模型的向量文件。有什么方法可以让我得到带有矢量文件的模型吗？我需要这个来预测新单词。提前致谢。

score 0 · Accepted Answer

我认为您正在寻找的是（如果我理解正确的话）如何创建端点以获取新单词的向量。查找 blazingtext 的示例。在底部，他们展示了如何创建这样的端点。

如果要预测模型不知道的新词，请使用subwords。

score 0 · Accepted Answer

您可以通过使用 KeyedVectors api 上传 vector.txt/bin 文件来重现类似的结果，例如 most_similar。

这是一个例子：

from gensim.models import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format('vectors.txt', binary=False) 
word_vectors = KeyedVectors.load_word2vec_format('vectors.bin', binary=True)

score -1 · Accepted Answer

BlazingText models can generate vectors for new words if you enable the subword embeddings learning by setting the "subwords" parameter to True while training. Once the training job is complete, you will need to create a SageMaker endpoint and deploy the model. You can send POST requests to this endpoint for retrieving the word vectors, as demonstrated in the "Hosting / Inference" section of this notebook:

bt_endpoint = bt_model.deploy(initial_instance_count = 1,instance_type = 'ml.m4.xlarge')
words = ["awesome", "awweeesome"]
payload = {"instances" : words}
response = bt_endpoint.predict(json.dumps(payload))
vecs = json.loads(response)

For more details about this feature, you can also refer to this blog post.

amazon-web-services - Amazon SageMaker BlazingText

3 回答 3

Related

Reference