Probably better suited to DSP StackExchange.
Suppose you FFT a single 110Hz tone to get a spectrogram; you'll see evenly spaced peaks at 110 220 330 etc Hz -- the harmonics. 110 is the fundamental.
Suppose you have 3 tones. Already it's going to look quite messy in the frequency domain. Especially if you have a chord containing e.g. A110 and A220.
On account of this, I think a neural network is a good approach.
Feed in FFT output.
It would be a good idea to use a neural network that accepts complex valued inputs, as FFT outputs of a complex number for each frequency bin.
http://www.eagle.tamut.edu/faculty/igor/PRESENTATIONS/IJCNN-0813_Tutorial.pdf
It may seem computationally wasteful to extract so many frequencies with FFT, but FFT algorithms are extremely efficient nowadays. You should probably use a bit strength of 10, so 2^10 inputs -> 2^9 = 512 complex bins.