I've programmed a fully connected recurrent network (based on Williams and Zipser) in Octave, and I successfully trained it using BPTT to compute an XOR as a toy example. The learning process was relatively uneventful:
XOR training
So, I thought I'd try training the network to compute the XOR of the first two inputs, and the OR of the last two inputs. However, this failed to converge the first time I ran it; instead, the error just oscillated continually. I tried decreasing the learning rate, and turning off momentum entirely, but it didn't help. When I ran it again this morning, it did end up converging, but not without more oscillations in the process:
XOR/OR training
So, my question: could this indicate a problem with my gradient computation, or is this just something that happens when training recurrent networks?