4

我正在使用 k-means 对数据进行聚类,但我没有使用标准算法,而是使用近似最近邻 (ANN) 算法来加速样本到中心的比较。这可以通过以下方式轻松完成:

[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));

现在,当我运行此代码时,变量“ trainDescriptors ”被聚集在一起,每个描述符都被分配给“ clusterCenters ”被聚类,并且每个描述符都使用 ANN

我还有另一个变量“ testDescriptors ”。我也想将它们分配给集群中心。并且此分配必须使用与“ trainDescriptors ”相同的方法完成,但 AFAIK vl_kmeans函数不会返回它为快速分配而构建的树。

所以,我的问题是,是否可以在vl_kmeans函数中将“ testDescriptors ”分配给“ clustersCenters ”作为“ trainDescriptors ”分配给“ clusterCenters ” ,如果可以,我该怎么做?

4

1 回答 1

4

嗯,我已经想通了。可以这样做:

clusterCount = 1024;
datasetTrain = single(rand(128, 100000)); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
    'Algorithm', 'ANN', ...
    'Distance', 'l2', ...
    'NumRepetitions', 1, ...
    'NumTrees', 3, ...
    'MaxNumComparisons', ceil(clusterCount / 50) ...
);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

forest = vl_kdtreebuild(clusterCenters, ...
    'Distance', 'l2', ...
    'NumTrees', 3 ...
);

trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

validation = isequal(trainAssignments_actual, trainAssignments_expected);

在第 2 步中,我正在使用集群中心创建一个新树,然后再次将数据分配给中心。它给出了一个有效的结果。

于 2014-06-08T13:14:47.377 回答