2

From https://stackoverflow.com/a/35684975/4533188 I got that K-Nearest Neighbour Imputation works like this:

  1. For the current observation get the distance to all the other observations.
  2. For each missing value in the current observation, consider all those k nearest observations that have no missing value in the feature in question.
  3. From those feature values of those observations: Calculate the mean (or some similar statistic) - this is the value which is used for the imputation.

The key step is 1: How do we calculate the distance if not all values are available? The post above points towards the Heterogeneous Euclidean-Overlap Metric. However I am interested in the implementation of knn-imputation of fancyimpute. I tracked it back to https://github.com/hammerlab/knnimpute, more specifically https://github.com/hammerlab/knnimpute/blob/master/knnimpute/few_observed_entries.py and I looked at the code. However I am not able to figure out how it works.

Can someone please explain to me, how the knnimpute works there? How is does the distance calculation work here?

4

1 回答 1

1

以下内容特定于 Scikit-Learn Python 库中的 KNNImpute 函数。文档:https ://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html

参数“metric”具有“nan_euclidian”作为默认值。文档可以在这里找到:https ://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.nan_euclidean_distances.html

直观地说,“nan-euclidian”距离在可能的情况下计算标准欧几里德距离(并且在两个观测值中的任何一个都缺失的情况下不计算),并线性缩放结果以补偿缺失的条目。

于 2021-12-16T05:18:43.533 回答