I have a dataset with 1000 dimensions and I am trying to cluster the data with DBSCAN in Python. I have a hard time understanding what metric to choose and why.
Can someone explain this? And how should I decide what values to set eps
to?
I am interested in the finer structure of the data so the min_value
is set to 2. Now I use the regular metric that is preset for dbscan in sklearn, but for small eps values, such as eps
< 0.07, I get a few clusters but miss many points and for larger values i get several smaller clusters and one huge. I do understand that everything depends on the data at hand but I am interested in tips on how to choose eps values in a coherent and structured way and what metrics to choose!
I have read this question and the answers there are with regards to 10 dimensions I have 1000 :) and I also do not know how to evaluate my metric so it would be interesting with a more elaborate explanation then: evaluate your metric!
Edit: Or tips on other clustering algorithms that work on high dimensional data with an existing python implementation.