I am trying to train a classifier to distinguish songs genres from the raw audio spectrum. For this I use a deep convolutional network in tflearn. However, the network will not converge/learn/the loss is increasing. I would be grateful if someone had an idea of why this might be.
The data I'm using is 128x128 grayscale images of the spectrogram, classified between Classical music (500 examples) and Hard rock (500 examples), 1-hot encoded.
Here's what the samples look like:
I can tell the difference between the two classes (I cannot show it because of stackoverflow's limit), and I doubt that a deep CNN simply is not capable of classifying these.
Here's what my loss looks like:
The code I used in tflearn for the model is the following:
convnet = input_data(shape=[None, 128, 128, 1], name='input')
convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 32, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 128, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = fully_connected(convnet, 1024, activation='elu')
convnet = dropout(convnet, 0.5)
convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='rmsprop', learning_rate=0.01, loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(convnet)
model.fit({'input': train_X}, {'targets': train_y}, n_epoch=100, batch_size=64, shuffle=True, validation_set=({'input': test_X}, {'targets': test_y}),
snapshot_step=100, show_metric=True)
Thank you very much for you help !