您所描述的问题通常称为聚类或聚类分析。根据您的数据集和分析约束,这可能会非常困难。但是,在您的情况下,这很简单,因为您有一个硬阈值(5 像素)可供使用。
Aardvarkk 已经提出了一个很好的解决方案,但它并没有真正展示集群的过程。这是一种非常简单的方法,您可以对数据进行聚类并获得或多或少相同的结果。
- 计算点之间的成对距离。对于 N 个点,这会产生一个 NxN 矩阵
- 阈值矩阵
- 遍历矩阵的行
每次迭代将如下所示:
- 如果
i
已经集群,继续
- 如果
i
不在集群中,则创建一个新集群并分配i
给它
- 找到所有其他接近的点
i
(行i
中等于 1 的列)
- 检查这些点是否已经在集群中
- 如果是,则设置
i
并且所有点都接近i
最小集群 ID
- 如果没有集合并且所有点都
i
靠近集群i
i's
下面是我创建的一个简单示例:
%Generate random points
nPts = 300;
clustId = zeros(nPts,1);
clusterCount = 0;
x = randi(3, [1, nPts])*10+ randn(1, nPts);
y = randi(3, [1, nPts])*10 + randn(1, nPts);
%Compute the distance matrix from http://goo.gl/8mAoC
dist = distance([x;y], [x;y]);
maxDist = 5;
binDist = dist <= maxDist;
for i = 1:nPts
% if this point is already clustered skip it
if clustId(i) ~= 0
continue;
end
%if the current point isn't in a cluster, create a new cluster and add
%the point to that cluster
if clustId(i) == 0
clusterCount = clusterCount+1;
clustId(i) = clusterCount;
fprintf('New point, creating cluster:%d\n', clusterCount);
end
% get the indices of the points that collide with the i
idx = binDist(:,i);
% check to see if idx contains points that already belong to another clustered
% if it doesn't collisionIdx will be equal to i
collisionIdx = idx & clustId>0;
%get the smallest cluster from collisionIdx
mergeClustId = min(clustId(collisionIdx));
%assing all the original points to that cluster
clustId(idx) = mergeClustId;
end
只需遍历集群 ID 即可计算质心:
cx = [];
cy = [];
for i = 1:clusterCount
idx = clustId == i;
cx(i) = mean(x(idx));
cy(i) = mean(y(idx));
end
然后绘制结果:
figure;
plot(cx,cy,'c.', 'markersize', 50); hold on;
plot(x,y,'.');