1

我正在尝试了解 Deeplearning4j 上的 LSTM。我正在检查示例的源代码,但我无法理解。

        //Allocate space:
    //Note the order here:
    // dimension 0 = number of examples in minibatch
    // dimension 1 = size of each vector (i.e., number of characters)
    // dimension 2 = length of each time series/example
    INDArray input = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);
    INDArray labels = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);

我们为什么要存储 3D 数组,这是什么意思?

4

1 回答 1

1

好问题。但这与 LSTM 功能无关,而是与任务本身有关。所以任务是预测下一个角色是什么。下一个字符的预测本身有两个方面:分类和近似。如果我们只处理近似值,我们只能处理一维数组。但是如果我们同时处理近似和分类,我们就不能只将字符的归一化 ascii 表示输入神经网络。我们需要将每个字符转换为数组。

例如 a ( a not capital ) 将以这种方式表示:

1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

b(不是大写)将表示为:0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0 c 将表示为:

0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Z(z大写!!!!)

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1

所以,每个字符都给了我们二维数组。所有这些维度是如何构建的?代码注释有以下解释:

    // dimension 0 = number of examples in minibatch
    // dimension 1 = size of each vector (i.e., number of characters)
    // dimension 2 = length of each time series/example

我想真诚地赞扬您为理解 LSTM 的工作原理所做的努力,但是您指出的代码给出了适用于各种 NN 的示例,并解释了如何在神经网络中处理文本数据,但没有解释 LSTM 的工作原理。您需要查看源代码的另一部分。

于 2016-05-16T14:34:11.803 回答