6

I am working with GPS data (latitude, longitude). For density based clustering I have used DBSCAN in R.

Advantages of DBSCAN in my case:

  1. I don't have to predefine numbers of clusters
  2. I can calculate a distance matrix (using Haversine Distance Formula) and use that as input in dbscan

    library(fossil)
    dist<- earth.dist(df, dist=T) #df is dataset containing lat long values
    library(fpc)
    dens<-dbscan(dist,MinPts=25,eps=0.43,method="dist")
    

Now, when I look at the clusters, they are not meaningful. Some clusters have points which are more than 1km apart. I want dense clusters but not that big in size.

Different values of MinPts and eps are taken care of and I have also used k nearest neighbor distance graph to get an optimum value of eps for MinPts=25

What dbscan is doing is going to every point in my dataset and if point p has MinPts in its eps neighborhood it will make a cluster but at the same time it is also joining the clusters which are density reachable (which I guess are creating a problem for me).

It really is a big question, particularly "how to reduce size of a cluster without affecting its information too much", but I will write it down as the following points:

  1. How to remove border points in a cluster? I know which points are in which cluster using dens$cluster, but how would I know if a particular point is core or border?
  2. Is cluster 0 always noise?
  3. I was under the impression that the size of a cluster would be comparable to eps. But that's not the case because density reachable clusters are combined together.
  4. Is there any other clustering method which has the advantage of dbscan but can give me more meaningful clusters?

OPTICS is another alternative but will it solve my issue?

Note: By meaningful I want to say closer points should be in a cluster. But points which are 1km or more apart should not be in the same cluster.

4

1 回答 1

7

DBSCAN 没有声称半径是最大集群大小。

你读过这篇文章吗?它正在寻找任意形状的集群;eps只是一个点的核心大小;粗略用于密度估计的大小;核心点半径内的任何点都将成为集群的一部分。

这使得它本质上是连接密集点的最大步长。但它们仍可能形成一个密度连接点链,具有任意形状或大小。

我不知道您的 R 实现中的集群 0 是什么。我已经尝试过 R 的实现,但它比其他所有的都。我不推荐使用 R,有更好的聚类分析工具可用,例如ELKI。尝试使用您在 ELKI 上的设置运行 DBSCAN,使用 LatLngDistanceFunction 和 sort-tile-recursive 加载的 R-tree 索引。与 R 相比,您会惊讶于它的速度有多快。

OPTICS 正在寻找相同密度连接类型的集群。您确定这种任意形状的集群是您正在寻找的吗?

恕我直言,你为你的目标使用了错误的方法(你并没有真正解释你想要实现的目标)

如果您想要对集群直径进行硬限制,请使用完整链接层次聚类

于 2013-12-31T18:01:45.180 回答