4

我正在NxM使用 对矩阵的行进行聚类kmeans

clustIdx = kmeans(data, N_CLUST, 'EmptyAction', 'drop');

然后我重新排列矩阵的行,使相邻的行在同一个簇中

dataClustered = data(clustIdx,:);

然而,每次我运行聚类分析时,我都会或多或少地得到相同的聚类,但具有不同的身份。因此,结构在dataClustered每次迭代后看起来都一样,但组的顺序不同。

我想重新排列我的集群标识,使得较低的集群标识代表密集的集群,而较高的数字是稀疏的集群。

有没有简单和/或直观的方法来做到这一点?

IE。兑换

clustIdx = [1 2 3 2 3 2 4 4 4 4];

clustIdx = [4 2 3 2 3 2 1 1 1 1]

身份本身是任意的,信息包含在分组中。

4

2 回答 2

3

If I understand correctly, you want to assign cluster label 1 to the cluster with most points, cluster label 2 to the cluster with the second most points, etc.

Assume you have a cluster label array called idx

>> idx = [1 1 2 2 2 2 3 3 3]';

Now you can relabel idx like this:

%# count the number of occurrences
cts = hist(idx,1:max(idx));

%# sort the counts - now we know that 1 should be last
[~,sortIdx] = sort(cts,'descend')
sortIdx =
     2     3     1

%# create a mapping vector (thanks @angainor)
map(sortIdx) = 1:length(sortIdx);
map =
     3     1     2

%# and remap indices
map(idx)
ans =
     3     3     1     1     1     1     2     2     2
于 2012-12-10T15:43:39.643 回答
1

It may not be efficient, but the easy way would be to first determine for each cluster how dense it is.

Then you can make a nx2 matrix that contains the Density and ClusterIdx

Afterwards a simple sort will give you the ClusterIdx in the right order

于 2012-12-10T15:44:14.043 回答