2

我使用以下 tsclust 语句对数据进行聚类

SURFSKINTEMP_CLUST <- tsclust(SURFSKINTEMP, k = 10L:20L,
                       distance = "dtw_basic", centroid = "dba",
                       trace = TRUE, seed = 938,
                       norm = "L2", window.size = 2L,
                       args = tsclust_args(cent = list(trace = TRUE)))

SURFSKINTEMP 很大,

str(SURFSKINTEMP)
List of 327239
 $ V1     : num [1:7] 0.13 0.631 -0.178 0.731 0.86 ...
 $ V2     : num [1:6] 0.117 -0.693 -0.911 -0.911 -0.781 ...
 $ V3     : num [1:7] 0.117 -0.693 -0.911 -0.911 -0.781 ...
 $ V4     : num [1:6] -0.693 -0.911 -0.911 -0.781 -0.604 ...

然后,我想使用 cvi 来评估最佳聚类数“k”</p>

names(SURFSKINTEMP_CLUST) <- paste0("k_",10L:20L)
sapply(SURFSKINTEMP_CLUST, cvi, type = "internal")

但是,有一个错误

> sapply(SURFSKINTEMP_CLUST, cvi, type = "internal")
Error: cannot allocate vector of size 797.8 Gb

在我的情况下,我如何评估最佳聚类数“k”?

4

2 回答 2

0

指定type = "internal"将尝试计算 7 个指数:Silhouette、Dunn、COP、DB、DB*、CH 和 SF。如文档中cvi所述,前 3 个将尝试计算整个交叉距离矩阵,在您的情况下将是一个327,239 x 327,239矩阵;您将很难找到一台可以分配它的计算机,而且计算需要很长时间

由于您将 DBA 用于质心,因此您可以查看 DB 或 DB* 是否对您的应用程序有意义

sapply(SURFSKINTEMP_CLUST, cvi, type = c("DB", "DBstar"))

您还可以查看稍微简单的肘部方法,记住您可以计算平方误差之和 (SSE)(请参阅文档TSClusters-class):

sapply(SURFSKINTEMP_CLUST, function(cl) { sum(cl@cldist ^ 2) })
于 2017-11-29T20:26:25.273 回答
0

The error message indicates you're trying to churn more data than available resources will support. In cases like these, attempt the analysis on a smaller sample. Repeat the analysis a number of times.

reps = 1000
samp_size = 10000
result <- c()
for(j in 1:reps){
    sample = SURFSKINTEMP[sample(seq_along(SURFSKINTEMP, samp_size)),]
    sample_clust <- tsclust(SURFSKINTEMP, k = 10L:20L,
                   distance = "dtw_basic", centroid = "dba",
                   trace = TRUE, seed = 938,
                   norm = "L2", window.size = 2L,
                   args = tsclust_args(cent = list(trace = TRUE)))

    result[j] <- sapply(sample_clust, cvi, type = "internal")

}

Provides a list of results you can inspect.

于 2017-11-29T14:08:36.947 回答