1

我正在使用mclust::Mclust()函数来聚类一个小数据集。但是,我正在努力为每个数据提取聚类分类以放入数据集中。

这是数据:

df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406, 
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125, 
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484, 
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863, 
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434, 
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041, 
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719, 
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156, 
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934, 
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445, 
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418, 
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652, 
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707, 
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211, 
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578, 
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109, 
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523, 
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211, 
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133, 
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477, 
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531, 
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922, 
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039, 
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875, 
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043, 
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742, 
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422, 
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211, 
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA, 
-58L), class = c("tbl_df", "tbl", "data.frame"))

聚类:

d_clust <- Mclust(df)

现在,当我运行plot(d_clust)它时,它会显示所有图表和所有内容。但它没有告诉我哪个集群对应于每一行。我查看了文档和其他文档(1 , 2 , 3)以及与Mclust()( 1 , 2 ) 相关的 stackoverflow 问题并不能满足我的问题。

我正在寻找这样的东西:

| latitud | longitud | cluster_id |

顺便说一句,当我做class(d_clust)的是一个Mclust类。d_clust如果你d_clust单独运行它没有给你一个表格/数据框来绘制,怎么可能绘制?

4

1 回答 1

0

当您运行 Mclust 时,它会尝试不同的模型和不同的 G(簇数)值。所以请查看 BIC 图:

在此处输入图像描述

因为 Mclust 只会根据 BIC 选择最佳模型,并将其保留为 d_clust$modelName 和 d_clus$G。

一旦你知道使用的是什么模型(我认为它的 EVE 和 G=4 对于你的情况),分类就有意义了,你可以简单地使用:

d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
   latitud longitud cluster
1 -43.8189 -72.3833       1
2 -34.2731 -71.3717       2
3 -47.0667 -72.8000       1
4 -35.7544 -71.0864       3
5 -47.1414 -72.7258       1
6 -36.6261 -72.4892       3

您还可以绘制:

with(results,plot(latitud,longitud,col=factor(cluster)))

在此处输入图像描述

您可以考虑聚类是否有意义,例如,您是否应该使用 G=4..

于 2019-11-19T00:38:35.917 回答