0

我正在尝试运行 ELKI 以在数据集上以 arff 文件的形式实现 k-medoids(对于 k=3)(使用 ELKI 中的 ARFFParser):

在此处输入图像描述

数据集有 7 个维度,但是我获得的聚类结果仅显示一维级别的聚类,并且仅针对 3 个属性执行此操作,而忽略了其余属性。像这样:

在此处输入图像描述

谁能帮助我如何获得所有维度的聚类可视化?

4

1 回答 1

1

ELKI is mostly used with numerical data.

Currently, ELKI does not have a "mixed" data type, unfortunately.

The ARFF parser will split your data set into multiple relations:

  1. a 1-dimensional numerical relation containing age
  2. a LabelList relation storing sex and region
  3. a 1-dimensional numerical relation containing salary
  4. a LabelList relation storing married
  5. a 1-dimensional numerical relation storing children
  6. a LabelList relation storing car

Apparently it has messed up the relation labels, though. But other than that, this approach works perfectly well with arff data sets that consist of numerical data + a class label, for example - the use case this parser was written for. It is a well-defined and consistent behaviour, though not what you expected it to do.

The algorithm then ran on the first relation it could work with, i.e. age only.

So here is what you need to do:

  1. Implement an efficient data type for storing mixed type data.
  2. Modify the ARFF parser to produce a single relation of mixed type data.
  3. Implement a distance function for this type, because the lack of a mixed type data representation means we do not have a distance to go with it either.
  4. Choose this new distance function in k-Medoids.
  5. Share the code, so others do not have to do this again. ;-)

Alternatively, you could write a script to encode your data in a numerical data set, then it will work fine. But in my opinion, the results of one-hot-encoding etc. are not very convincing usually.

于 2015-04-29T08:01:40.643 回答