4

I am performing mean shift clustering on a dataset. estimate_bandwidth function estimates the appropriate bandwidth to perform mean-shift clustering.

Syntax:

sklearn.cluster.estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0)

I found out that the estimated bandwidth increases with increase in quantile resulting in less number of clusters. Similarly, decrease in quantile decreases the bandwidth and hence higher no. of clusters.

So, it seems no. of clusters is dependent upon quantile value chosen.

How to choose the optimum quantile?

4

1 回答 1

0

KNN 中使用分位数(在estimate_bandwidth 函数内部使用)来确定带宽。
具体来说:

n = KNN 中的样本数 = 批次中的样本数 * 分位数

然后将根据同一簇中的样本之间的平均成对距离(由 KNN 返回)计算带宽。所以你可以用它来弄清楚如何设置带宽。此函数返回的带宽平均会覆盖 n 个样本,这将强烈影响 Mean Shift 将返回的聚类数量。

于 2019-04-04T21:59:11.027 回答