我刚刚使用 ggplot2 的 stat_ellipse 来识别异常值。我使用了 0.999 的置信水平。
此函数提取椭球外的点,并采用 ggplot 和绘制椭球的图层。
# Function for identifying points outside ellipse
outside_ellipse <- function(ggplot, ellipsoid_layer_number) {
# Extracting components
build <- ggplot_build(ggplot)$data
points <- build[[1]]
ell <- build[[ellipsoid_layer_number]]
# Finding points are inside the ellipse, and add this to the data
df <- data.frame(points[1:2],
in.ell = as.logical(point.in.polygon(points$x, points$y, ell$x, ell$y)))
# Plot the result
ggplot(df, aes(x, y)) +
geom_point(aes(col = in.ell)) +
stat_ellipse()
# Returning indices of outliers
return(which(df$in.ell == FALSE))
}
在这里,我使用椭球选项绘制我的数据,并提取椭球外的点并将它们的信息添加到数据框中。
# Saving plot with confidence ellipsoid
plotData <- ggplot(pc_df, aes(PC1, PC2)) + geom_point() + stat_ellipse(level = 0.999)
# Identifying points outside ellipsoid
outside <- outside_ellipse(plotData, 2)
pc_df$in_ellipsoid <- rep(FALSE, dim(pc_df)[1])
pc_df$in_ellipsoid[outside] <- TRUE