1

I am using the Doc2vec class from the gensim framework to compute the vectorial representation of each document in a corpus.

The corpus contains very short sentences, they can have even one word. I observed that for many sentences, especially the short ones, Doc2vec does not provide any representations. Could someone explain the reasons for this?

4

1 回答 1

2

I had this same problem. I solved it by setting the parameter min_count=1.

model = doc2vec.Doc2Vec(size=100)

became

model = doc2vec.Doc2Vec(size=100, min_count=1)

Made my problem go away!

I found my answer in the comments of the doc2vec tutorial http://radimrehurek.com/2014/12/doc2vec-tutorial/

于 2015-04-20T19:28:58.983 回答