2

I am use ELKI data mining software for outlier detection. It have many outliers detection techniques but all provides same results(same outliers with all techniques the only difference is in the size of the circle around the points as shown in figures below). I uses the mouse head dataset provided on the ELKI website. In data-set all the points are labeled with its respective cluster name, whether its is from ear_left or ear_right or head or noise. If i change the label of noise to the ear_right, it then shows that outlier point as ear_right. i have change 5 out of 10 noise label to ear_right.

here is the result of using KNN and LDOF outlier detection technique with modified data-set and in ELKI:

enter image description here

Is it a problem with the software or i am doing something wrong? have anyone tried it using for outlier detection? Is there any good software which can perform outlier detection using different algorithms like LOF, LDOF , KNN or where i could find algorithm source code for these techniques?

4

1 回答 1

1

这是一个非常简单的数据集。

这些方法或多或少都很好用也就不足为奇了。因为这是一个玩具数据集,而不是真实数据……在真实数据上,异常值检测困难得多。

请注意,ELKI 中的实现分配了数字分数。它们不会产生是/否异常值的决定;从分数中得出这是微不足道的。

如果您想要一个二进制结果,您可以设置可视化缩放参数以仅可视化前 k 个结果。在其他情况下,您可能需要阅读实际论文。例如,LOCI 的作者建议将分数大于 3 的对象视为异常值。(不幸的是,大多数方法都没有特别简单的解释。)

不要在分类框中思考。异常值检测是一种探索性技术,而不是分类。

ELKI 还可以使用许多度量来评估异常值方法的质量,例如 ROC AUC、ROC 曲线、Precision@k、AveP、Maximum-F1。

于 2014-11-12T18:59:03.477 回答