From https://stackoverflow.com/a/35684975/4533188 I got that K-Nearest Neighbour Imputation works like this:
- For the current observation get the distance to all the other observations.
- For each missing value in the current observation, consider all those k nearest observations that have no missing value in the feature in question.
- From those feature values of those observations: Calculate the mean (or some similar statistic) - this is the value which is used for the imputation.
The key step is 1: How do we calculate the distance if not all values are available? The post above points towards the Heterogeneous Euclidean-Overlap Metric. However I am interested in the implementation of knn-imputation of fancyimpute. I tracked it back to https://github.com/hammerlab/knnimpute, more specifically https://github.com/hammerlab/knnimpute/blob/master/knnimpute/few_observed_entries.py and I looked at the code. However I am not able to figure out how it works.
Can someone please explain to me, how the knnimpute
works there? How is does the distance calculation work here?