So I am being stumped by something that (should) be simple:
I have written a SOM for a simple 'play' two-dimensional data set. Here is the data:
You can make out 3 clusters by yourself.
Now, there are two things that confuse me. The first is that the tutorial that I have, normalizes the data before the SOM gets to work on it. This means, it normalizes each data vector to have length 1. (Euclidean norm). If I do that, then the data looks like this:
(This is because all the data has been projected onto the unit circle).
So, my question(s) are as follows:
1) Is this correct? Projecting the data down onto the unit circle seems to be bad, because you can no longer make out 3 clusters... Is this a fact of life for SOMs? (ie, that they only work on the unit circle).
2) The second related question is that not only are the data normalized to have length 1, but so are the weight vectors of each output unit after every iteration. I understand that they do this so that the weight vectors dont 'blow up', but it seems wrong to me, since the whole point of the weight vectors is to retain distance information. If you normalize them, you lose the ability to 'cluster' properly. For example, how can the SOM possibly distinguish between the cluster on the lower left, from the cluster on the upper right, since they project down to the unit circle the same way?
I am very confused by this. Should data be normalized to unit length in SOMs? Should the weight vectors be normalized as well?
Thanks!
EDIT
Here is the data, saved as a .mat file for MATLAB. It is a simple 2 dimensional data set.