I have trained (fit and transform) a SVD model using 400 documents as part of my effort to build a LSA model. Here is my code:
tfidf_vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(stop_words='english', use_idf=True, smooth_idf=True)
svd_model = TruncatedSVD(n_components=100, n_iter=10)
lsa_pipeline = Pipeline([('tfidf', tfidf_vectorizer), ('svd', svd_model)])
lsa_model = lsa_pipeline.fit_transform(all_docs)
Now, I want to measure the similarity of two sentences (whether from the same document collection or totally new) and I need to transform these two sentences into vectors. I want to do the transformation in my own way and I need to have the vector of each word in sentence.
How can I find the vector of a word using the lsa_model that I already trained?
And, more broadly speaking, does it make sense to build a LSA model using a collection of documents and then use the same model for measuring the similarity of some sentences from the same document collection?