0

1.我正在使用 IntelliJ IDEA 构建一个 maven 项目。代码如下:</p>

public class Test1 {
public static void main(String[] args) throws IOException {

    System.out.println("Load data....");
    SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt"));
    iter.setPreProcessor(new SentencePreProcessor() {
        @Override

            return sentence.toLowerCase();
        }
    });



    System.out.println("Build model....");
    int batchSize = 1000;
    int iterations = 30;
    int layerSize = 300;
    com.sari.Word2Vec vec= new  com.sari.Word2Vec.Builder()
            .batchSize(batchSize) //# words per minibatch.
            .sampling(1e-5) // negative sampling. drops words out
            .minWordFrequency(5) //
            .useAdaGrad(false) //
            .layerSize(layerSize) // word feature vector size
            .iterations(iterations) // # iterations to train
            .learningRate(0.025) //
            .minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
            .negativeSample(10) // sample size 10 words
            .iterate(iter) //
            .tokenizerFactory(tokenizer)
            .build();
    vec.fit();

    System.out.println("Evaluate model....");

    double cosSim = vec.similarity("day" , "night");
    System.out.println("Similarity between day and night: "+cosSim);

这段代码引用了deeplearning4j中的word2vec,但结果不稳定。每个实验的结果都非常不同。例如,以“day”和“night”相似度的余弦值,有时结果高达 0.98,有时低至 0.5。

以下是两次实验的结果:</p>

Evaluate model....
Similarity between day and night: 0.8252374529838562
Evaluate model....
Similarity between day and night: 0.5550910234451294

为什么结果是这样的?刚开始学习word2vec,有很多东西没看懂,希望前辈能帮帮我。

4

1 回答 1

0

Are you working with the 0.4 examples?

If not, please check out this one for W2V:

github.com/deeplearning4j/dl4j-0.4-examples

You can run examples with different randomly initialized weights, but different random initializations can lead to different results.

You can seed the model with the same random initial weights each time you run it by adding an additional parameter: .seed(x), where x is a number like 42.

于 2015-09-17T02:31:34.190 回答