I'm actually working on high dimensional data (~50.000-100.000 features) and nearest neighbors search must be performed on it. I know that KD-Trees has poor performance as dimensions grows, and also I've read that in general, all space-partitioning data structures tends to perform exhaustive search with high dimensional data.
Additionally, there are two important facts to be considered (ordered by relevance):
- Precision: The nearest neighbors must be found (not approximations).
- Speed: The search must be as fast as possible. (The time to create the data structure isn't really important).
So, I need some advice about:
- The data structure to perform k-NN.
- If it will be better to use an aNN (approximate nearest neighbor) approach, setting it as accurate as possible?.