machine-learning - In DBSCAN, how to determine border points?

Question

In DBSCAN, the core points is defined as having more than MinPts within Eps.

So if MinPts = 4, a points with total 5 points in Eps is definitely a core point. How about a point with 4 points (including itself) in Eps? Is it a core point, or a border point?

score 2 · Accepted Answer

边界点是（在 DBSCAN 中）集群的一部分，但本身并不密集的点（即不是核心点的每个集群成员）。

在后续算法HDBSCAN中，边界点的概念被丢弃了。

坎佩罗，RJGB；穆拉维，D。桑德，J。（2013）。
基于层次密度估计的基于密度的聚类。
第 17 届亚太地区数据库知识发现会议论文集，PAKDD 2013。计算机科学讲义 7819。p。160.doi:10.1007/978-3-642-37456-2_14

其中指出：

我们的新定义更符合集群的统计解释，因为密度 [...] 边界对象的水平集的连接组件在技术上不属于水平集（它们的估计密度低于阈值）。

score 1 · Accepted Answer

实际上，我只是重新阅读了原始论文，定义 1 使它看起来像核心点属于它自己的 eps 邻域。因此，如果 minPts 为 4，则一个点在其 eps 邻域中至少需要 3 个其他点。

注意在定义 1 中他们说 NEps(p) = {q ∈D | dist(p,q)≤Eps}。如果该点被排除在其 eps 邻域之外，那么它会说 NEps(p) = {q ∈D | dist(p,q) ≤ Eps 和 p != q}。其中 != 是“不等于”。

DBSCAN 的作者在图 4 中的 OPTICS 论文中也强调了这一点。http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf

所以我认为 SciKit 的解释是正确的，维基百科的插图在http://en.wikipedia.org/wiki/DBSCAN中具有误导性

score 0 · Accepted Answer

这在很大程度上取决于实施。最好的方法是自己玩实现。

在原始的 DBSCAN ¹ 论文中，核心点条件为 N_Eps>=MinPts，其中 N_Eps 是某个数据点的 Epsilon 邻域，它被排除在其自身的 N_Eps 之外。

按照您的示例，如果 MinPts = 4 和 N_Eps = 3 （或 4 包括您所说的自身），那么根据原始论文，它们不会形成集群。另一方面，DBSCAN 的 scikit-learn ²实现以其他方式工作，这意味着它计算点本身来形成一个组。所以对于MinPts=4，总共需要四个点来形成一个簇。

[1] 埃斯特，马丁；克里格尔，汉斯-彼得；桑德，约尔格；徐小伟（1996）。“一种基于密度的算法，用于在有噪声的大型空间数据库中发现集群。”

[2] http://scikit-learn.org

machine-learning - In DBSCAN, how to determine border points?

3 回答 3

Related

Reference