theano - 使用 Keras 和 Theano 进行模型选择需要很长时间

Question

我正在使用 Keras 和 Theano 对一组具有不同架构和参数的循环神经网络进行模型选择和性能估计的嵌套交叉验证，这些网络设置为在 AWS P2 实例上运行，该实例具有带有 CUDA 的 Tesla K80 GPU 和安装/启用 cuDNN。

为了执行模型选择，我比较了从参数空间采样的 30 个模型，使用

param_grid = {
             'nb_hidden_layers': [1, 2, 3],
             'dropout_frac': [0.15, 0.20],
             'output_activation': ['sigmoid', 'softmax'],
             'optimization': ['Adedelta', 'RMSprop', 'Adam'],
             'learning_rate': [0.001, 0.005, 0.010],
             'batch_size': [64, 100, 150, 200],
             'nb_epoch': [10, 15, 20],
             'perform_batchnormalization': [True, False]
             }
params_list = list(ParameterSampler(param_grid, n_iter = 30))

NeuralNetworkClassifier()然后我使用下面定义的函数构建一个 RNN 模型

def NeuralNetworkClassifier(params, units_in_hidden_layer = [50, 75, 100, 125, 150]):
    nb_units_in_hidden_layers = np.random.choice(units_in_hidden_layer, size = params['nb_hidden_layers'], replace = False)

    layers = [8]    # number of features in every week
    layers.extend(nb_units_in_hidden_layers)
    layers.extend([1])  # node identifying quit/stay

    model = Sequential()

    # constructing all layers up to, but not including, the penultimate one
    layer_idx = -1  # this ensures proper generalization nb_hidden_layers = 1 (for which the loop below will never run)
    for layer_idx in range(len(layers) - 3):
        model.add(LSTM(input_dim = layers[layer_idx], output_dim = layers[layer_idx + 1], init = 'he_uniform', return_sequences = True))    # all LSTM layers, up to and including the penultimate one, need return_sequences = True
        if params['perform_batchnormalization'] == True:
            model.add(BatchNormalization())
            model.add(Activation('relu'))
        model.add(Dropout(params['dropout_frac']))
    # constructing the penultimate layer
    model.add(LSTM(input_dim = layers[layer_idx + 1], output_dim = layers[(layer_idx + 1) + 1], init = 'he_uniform', return_sequences = False)) # the last LSTM layer needs return_sequences = False
    if params['perform_batchnormalization'] == True:
        model.add(BatchNormalization())
        model.add(Activation('relu'))
    model.add(Dropout(params['dropout_frac']))
    # constructing the final layer
    model.add(Dense(output_dim = layers[-1], init = 'he_normal'))
    model.add(Activation(params['output_activation']))

    if params['optimization'] == 'SGD':
        optim = SGD()
        optim.lr.set_value(params['learning_rate'])
    elif params['optimization'] == 'RMSprop':
        optim = RMSprop()
        optim.lr.set_value(params['learning_rate'])
    elif params['optimization'] == 'Adam':
        optim = Adam()
    elif params['optimization'] == 'Adedelta':
        optim = Adadelta()

    model.compile(loss = 'binary_crossentropy', optimizer = optim, metrics = ['precision'])

    return model

它构造了一个 RNN，其隐藏层数由参数'nb_hidden_layers'in给出，param_grid并且每层中的隐藏单元数是从列表中随机采样的[50, 75, 100, 125, 150]。最后，这个函数compile是模型并返回它。

在嵌套交叉验证 (CV) 期间，内部循环（运行IN时间）比较 30 个随机选择的模型的性能。在这一步之后，我在外循环中选择性能最好的模型，并在一个保留数据集上估计它的性能；这个方案是重复的OUT。因此，我compileing 一个 RNN 模型OUTx INx30 次，这需要非常长的时间；例如，当OUT=4和时IN=3，我的方法需要 6 到 7 个小时才能完成。

我看到 GPU 被偶尔使用（但 GPU 使用率从未超过 40%）；但是，大多数时候，正在使用的是 CPU。我的（未受过教育的）猜测是，这compile在 CPU 上进行了很多次并占用了大量的计算时间，而模型拟合和预测是在 GPU 上完成的并且需要很短的时间。

我的问题：

有没有办法补救这种情况？
实际上是compile在CPU上完成的吗？
人们如何做嵌套CV来选择最好的RNN架构？
我在生产服务器上执行这个方案是否合理？您是否建议我做一个可能需要 24 小时的大型嵌套 CV 来选择性能最佳的模型，然后在生产服务器上使用那个模型？

谢谢你们。

score 2 · Accepted Answer

我无法回答您的所有问题，仍然希望对您有所帮助。

编译是在 CPU 中完成的，因为它主要由符号图操作和代码生成组成。更糟糕的是，theano 图优化使用纯 Python 代码，与 C/C++ 实现相比，这可能是一种开销。

改进 theano 编译时间（以运行时性能为代价）：

使用不那么激进的优化

在/home/ec2-user/.theanorc添加行中：

optimizer = fast_compile

或完全禁用优化：

optimizer = None

预编译一些块

如果您的模型中有共享的公共块，您可以使用预编译它们theano.OpFromGraph

但是，您不能仅在 Keras 中做到这一点。

切换框架

Keras 确实支持 tensorflow 后端。与 theano 相比，tensorflow 更像是一个虚拟机而不是编译器。通常，TF 运行速度比 theano 慢，但编译速度要快得多。

theano - 使用 Keras 和 Theano 进行模型选择需要很长时间

1 回答 1

使用不那么激进的优化

预编译一些块

切换框架

Related

Reference