java - Clustering using Java-ML package

Question

I have a dataset with each instance having a single attribute value, and need to apply clustering on it. Java-ML (Java Machine Learning Library) seemed suitable to me for this task. But I found that the class "Dataset" in it is structured as a set of instances which is structured as a set of attributes and a class label. My problem is that I have a single attribute for each instance and no class label.

Here is a sample code that I tried and unexpectedly the execution doesn't end once it starts clustering.

    int k;
    Dataset dataset = new DefaultDataset();
    double[] val= {5,6,15,20,40,50,55,73};
    for(int i = 0; i < val.length; i++) {
        Instance instance= new SparseInstance(1);
        instance.put(1, val[i]);
        dataset.add(instance);
    }
    k = 3;
    Clusterer km = new KMeans(k);
    System.out.println(dataset);
    Dataset[] clusters = km.cluster(dataset);
    System.out.println(dataset);
    for(int i = 0; i < k; i++) {
        System.out.println(clusters[i]+"\n\n\n\n");
    }

I am unable to understand the reason behind such an unexpected behavior. Is there any other library that suits my work more than Java-ML?

Thanks in advance.

score 2 · Accepted Answer

首先，由于您的数据是一维的，因此首先不要使用聚类。

可以对一维数据进行排序，这允许比一般情况更快的算法。您可能想研究经典统计、自然中断、核密度估计等。事实上，我将从核密度估计开始，并将数据拆分为两个局部最大值之间的最低最小值。

现在对于Java-ML，您所说的表明它实际上是一个分类包。对于以分类为驱动的应用程序，对类标签的需求是典型的。那里本质上是有一个学习和验证的类标签。

我主要使用 ELKI，它有大量的聚类算法可供选择，并且不希望数据被标记。

score 1 · Accepted Answer

如果您只有一个特征值，那么几乎没有理由使用任何聚类算法。仅使用直方图或 KDE 绘图就足以找到您要查找的信息。

java - Clustering using Java-ML package

2 回答 2

Related

Reference