0

I am trying to use precomputed distances with Elki, but for some reason cannot get it working. I have read the instructions here: http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances and this question on SO: ELKI - input distance matrix.

Unfortunately I am still unable to get ELKI working.

This is the command I am running in a bash shell:

java -jar  elki.jar -verbose  -dbc.filter FixedDBIDsFilter -dbc.startid 0 -dbc.in elki_dummy_ids -algorithm clustering.kmeans.KMeansLloyd -algorithm.distancefunction external.FileBasedDoubleDistanceFunction -distance.matrix elki_sample_dist_ut.txt -kmeans.k 3

And these are the contents of the files in the parameters:

$cat elki_dummy_ids
0
1
2


$cat elki_sample_dist_ut.txt
0 0 0.0000
0 1 0.8876
0 2 0.8571
1 1 0.0
1 2 0.9059
2 2 0.0

I tried with a lower-triangular distance matrix too:

$cat elki_sample_dist_lt.txt
0 0 0.0000
1 0 0.8876
1 1 0.0
2 0 0.8571
2 1 0.9059
2 2 0.0

but no luck with that either. I keep getting this error (truncated - but let me know if you need the full error msg):

The following parameters were not processed: [external.FileBasedDoubleDistanceFunction, -distance.matrix, elki_sample_dist_ut.txt] Task is not completely configured:

Wrong value of parameter algorithm.distancefunction. Read: de.lmu.ifi.dbs.elki.distance.distancefunction.external.FileBasedDoubleDistanceFunction. Expected: Distance function to determine the distance between database objects. Implementing de.lmu.ifi.dbs.elki.distance.distancefunction.PrimitiveDistanceFunction Known classes (default package de.lmu.ifi.dbs.elki.distance.distancefunction):

I am using OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1) and Elki 0.6.0.

Can someone please point out what I am missing here? Thanks in advance!

4

1 回答 1

0

k-means cannot be used with precomputed distances.

Because it computes distances from points to centroids, which you do not know before, and thus cannot be precomputed.

Plus, k-means should only be used on numerical data, with squared Euclidean distance. Otherwise it may fail to converge. The mean minimizes sum-of-squared deviations, and does not minimize arbitrary distances.

You might be looking for PAM, k-medoids, DBSCAN, OPTICS, HAC, ... these algorithms do work with other distances, and only need pairwise distances.

于 2014-08-28T22:34:42.480 回答