-1

I am trying to train a classifier to distinguish songs genres from the raw audio spectrum. For this I use a deep convolutional network in tflearn. However, the network will not converge/learn/the loss is increasing. I would be grateful if someone had an idea of why this might be.

The data I'm using is 128x128 grayscale images of the spectrogram, classified between Classical music (500 examples) and Hard rock (500 examples), 1-hot encoded.

Here's what the samples look like:

Classical extract

I can tell the difference between the two classes (I cannot show it because of stackoverflow's limit), and I doubt that a deep CNN simply is not capable of classifying these.

Here's what my loss looks like:

Loss plot in tflearn

The code I used in tflearn for the model is the following:

convnet = input_data(shape=[None, 128, 128, 1], name='input')

convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)

convnet = conv_2d(convnet, 32, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)

convnet = conv_2d(convnet, 128, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)

convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)

convnet = fully_connected(convnet, 1024, activation='elu')
convnet = dropout(convnet, 0.5)

convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='rmsprop', learning_rate=0.01, loss='categorical_crossentropy', name='targets')

model = tflearn.DNN(convnet)

model.fit({'input': train_X}, {'targets': train_y}, n_epoch=100, batch_size=64, shuffle=True, validation_set=({'input': test_X}, {'targets': test_y}), 
    snapshot_step=100, show_metric=True)

Thank you very much for you help !

4

1 回答 1

0

我通常会尝试的几件事是:

  • 较低的学习率

  • 尝试另一个激活

  • 暂时去除辍学

高温高压

于 2016-11-11T16:19:07.060 回答