r - 使用 ggplot2 可视化从 MClust 中提取的集群

Question

我正在使用 mclust 分析我的数据分布（后续使用 Mclust 进行聚类会导致空簇）
在这里我的数据下载https://www.file-upload.net/download-14320392/example.csv.html

首先，我评估数据中存在的集群：

library(reshape2)
library(mclust)
library(ggplot2)

data <- read.csv(file.choose(), header=TRUE,  check.names = FALSE)
data_melt <- melt(data, value.name = "value", na.rm=TRUE)

fit <- Mclust(data$value, modelNames="E", G = 1:7)
summary(fit, parameters = TRUE)

---------------------------------------------------- 
Gaussian finite mixture model fitted by EM algorithm 
---------------------------------------------------- 

Mclust E (univariate, equal variance) model with 4 components: 

log-likelihood    n df       BIC       ICL
-20504.71 3258  8 -41074.13 -44326.69

Clustering table:
1    2    3    4 
0 2271  896   91 

Mixing probabilities:
1         2         3         4 
0.2807685 0.4342499 0.2544305 0.0305511 

Means:
1        2        3        4 
1381.391 1381.715 1574.335 1851.667 

Variances:
1        2        3        4 
7466.189 7466.189 7466.189 7466.189

现在已经确定了它们，我想用各个组件的分布覆盖总分布。为此，我尝试使用以下方法将每个值的分配提取到相应的集群：

df <- as.data.frame(data)
df$classification <- as.factor(df$value[fit$classification])

ggplot(df, aes(value, fill= classification)) + 
  geom_density(aes(col=classification, fill = NULL), size = 1)

结果，我得到以下信息：

它看起来有效，但是，我想知道，
a）各个分类的描述（1602、1639 和 1823）来自哪里
b）我如何将各个密度缩放为总数的一部分（例如 1823 仅贡献3258 个观测值中有 91 个值；见上文）
c）是否可以根据获得的均值 + SD 交替使用预测的正态分布？

任何帮助或建议都非常感谢！

score 2 · Accepted Answer

我认为你可以通过以下方式得到你想要的：

library(magrittr)
data_melt <- data_melt %>% mutate(class = as.factor(fit$classification))
ggplot(data_melt, aes(x=value, colour=class, fill=class)) + 
    geom_density(aes(y=..count..), alpha=.25)

r - 使用 ggplot2 可视化从 MClust 中提取的集群

1 回答 1

Related

Reference