2

I have a dataset with each instance having a single attribute value, and need to apply clustering on it. Java-ML (Java Machine Learning Library) seemed suitable to me for this task. But I found that the class "Dataset" in it is structured as a set of instances which is structured as a set of attributes and a class label. My problem is that I have a single attribute for each instance and no class label.

Here is a sample code that I tried and unexpectedly the execution doesn't end once it starts clustering.

    int k;
    Dataset dataset = new DefaultDataset();
    double[] val= {5,6,15,20,40,50,55,73};
    for(int i = 0; i < val.length; i++) {
        Instance instance= new SparseInstance(1);
        instance.put(1, val[i]);
        dataset.add(instance);
    }
    k = 3;
    Clusterer km = new KMeans(k);
    System.out.println(dataset);
    Dataset[] clusters = km.cluster(dataset);
    System.out.println(dataset);
    for(int i = 0; i < k; i++) {
        System.out.println(clusters[i]+"\n\n\n\n");
    }

I am unable to understand the reason behind such an unexpected behavior. Is there any other library that suits my work more than Java-ML?

Thanks in advance.

4

2 回答 2

2

首先,由于您的数据是一维的,因此首先不要使用聚类

可以对一维数据进行排序,这允许比一般情况更快的算法。您可能想研究经典统计、自然中断、核密度估计等。事实上,我将从核密度估计开始,并将数据拆分为两个局部最大值之间的最低最小值。

现在对于Java-ML,您所说的表明它实际上是一个分类包。对于以分类为驱动的应用程序,对类标签的需求是典型的。那里本质上是有一个学习和验证的类标签。

我主要使用 ELKI,它有大量的聚类算法可供选择,并且不希望数据被标记。

于 2013-06-30T09:36:52.640 回答
1

如果您只有一个特征值,那么几乎没有理由使用任何聚类算法。仅使用直方图或 KDE 绘图就足以找到您要查找的信息。

于 2013-06-30T00:16:23.910 回答