8

The format of my dataset: [x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23.

My question now is how can I cluster this data when I need an euclidean distance metric for the coordinates, but a different one for the hours (since d(23,0) is 23 in the euclidean distance metric). Is it possible to cluster data with different distance metrics for each feature in scipy? How?

Thank you

4

1 回答 1

3

您需要定义自己的指标,以适当的方式处理“时间”。在scipy.spatial.distance.pdist的文档中,您可以定义自己的函数

Y = pdist(X, f)

使用用户提供的 2 元函数 f 计算 X 中所有向量对之间的距离。[...] 例如,向量之间的欧几里得距离可以计算如下:

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

该指标可以通过metric关键字传递给任何 scipy 聚类算法。例如,使用linkage

scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
于 2013-09-11T16:07:39.473 回答