decode - keras 将向量嵌入到 one-hot

Question

我在 NLP 问题中使用 keras。当我尝试根据前一个词预测下一个词时，会出现一个关于词嵌入的问题。我已经通过 keras 嵌入层将 one-hot 词转换为词向量，如下所示：

word_vector = Embedding(input_dim=2000,output_dim=100)(word_one_hot)

并使用这个 word_vector 做某事，模型最后给出另一个 word_vector。但我必须看看预测词到底是什么。如何将 word_vector 转回 word_one_hot？

score 3 · Accepted Answer

这个问题很老，但似乎与一个常见的混淆点有关，即嵌入是什么以及它们服务于什么目的。

首先，如果以后要嵌入，则永远不要转换为 one-hot。这只是浪费的一步。

从原始数据开始，您需要对其进行标记。这只是为词汇表中的每个元素（数据中所有可能的单词/字符[您的选择]的集合）分配一个唯一整数的过程。Keras 为此提供了便利功能：

from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
max_words = 100 # just a random example, 
# it is the number of most frequently occurring words in your data set that you want to use in your model.
tokenizer = Tokenizer(num_words=max_words)
# This builds the word index
tokenizer.fit_on_texts(df['column'])

# This turns strings into lists of integer indices.
train_sequences = tokenizer.texts_to_sequences(df['column'])

# This is how you can recover the word index that was computed
print(tokenizer.word_index)

嵌入生成表示。模型中后面的层使用较早的表示来生成更抽象的表示。最终表示用于生成可能类别数量的概率分布（假设分类）。

当您的模型进行预测时，它会为 word_index 中的每个整数提供概率估计。因此，“cat”是最有可能出现的下一个词，并且您的 word_index 有类似 {cat:666} 的内容，理想情况下，该模型会为 666（而不是“cat”）提供很高的可能性。这有意义吗？该模型永远不会预测嵌入向量，嵌入向量是输入数据的中间表示，它们（希望）可用于预测与单词/字符/类相关的整数。

decode - keras 将向量嵌入到 one-hot

1 回答 1

Related

Reference