0

我正在使用 Matlab clusterdata 函数将我的数据(噪声和非噪声)分为两类:噪声组和非噪声组。该函数运行良好,只是有时它将所有噪声数据命名为组 1,将所有非噪声数据命名为组 2。有时它将所有噪声数据命名为组 2,将所有非噪声数据命名为组 1。

我该如何控制它?我的意思是将所有噪声数据标记为第 1 组。

4

1 回答 1

0

Having control over the name of the labels an unsupervised learning algorithm uses can generally be a problem. I suggets to try to evaluate some of the features of the data after doing the clustering to see if the labels are as you want them.

If all your data is in X (N x d) matrix, with a label vector Y(N x 1) taking values -1 and 1, you could evaluate the variance of each of the clusters. I suspect the noise data would exhibit higher variance, which could be used to see if the labels should be switched.

In the code below, 1 should be the non-noise, and -1 should be noise (this choice of labels (groups) makes it easier to flip the labels around).

%#Variance summed over all dimensions    
varL1 = sum(var(X(Y==1,:)));  
varL2=  sum(var(X(Y==-1,:)));

%#Flip labels if if L1 is higher than L2
if varL1 > varL2
    Y = Y * (-1);
end

If this works, you could afterwards change noise cluster to be group 1 and non-noise to group 2 by

Y(Y==1) = 2;  %#NB: The order of which these statements are evaluated is important.
Y(Y==-1) = 1;
于 2011-12-08T08:45:25.497 回答