0

I'm trying to understand how ELMo is designed and how it works, and I have a couple of questions:

  1. Is the ELMo architecture (visualized in the figure below) used for training the model, or for generating the context-dependent embeddings using the pre-trained model? Or is the same for both? ELMo architecture

Source

  1. Before passing the input to the Bi-LSTM layers, it is passed through a convolutional neural network (CNN) to convert the words into raw word vectors (character-based). How CNN does this? Any helpful references?

Thank you.

4

0 回答 0