0

我想NbClust()为几个数据框调用该函数。我通过包含NbClust()函数调用的 for 循环“发送”它们来做到这一点。代码如下所示:

#combos of just all columns from df
variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE)
for(i in 1:length(variations)){
  df = data.frame(variations[i]) 
  nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete")
}

不幸的是,它总是会产生以下错误。奇怪的是,如果我在没有循环的情况下应用相同的函数调用(即仅对一个数据帧),它可以完美地工作......那么有什么问题?

我查看了源代码,NbClust确实有一行包含错误消息的代码,但我无法相应地更改代码。你知道问题可能是什么吗?

if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + : 需要 TRUE/FALSE 的缺失值

此外,它还会产生以下警告:

In addition: Warning messages:
1: In max(DiffLev[, 5], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
2: In matrix(c(results), nrow = 2, ncol = 26) :
  data length [51] is not a sub-multiple or multiple of the number of rows [2]
3: In matrix(c(results), nrow = 2, ncol = 26, dimnames = list(c("Number_clusters",  :
  data length [51] is not a sub-multiple or multiple of the number of rows [2]

数据如下:

df = structure(list(GDP = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6, 17, 4.9, 
7.9, 2, 14.2, 48.2, 17.1, 10.4, 37.5, 1.6, 49.5, 10.8, 6.2, 7.1, 
7.8, 3, 3.7, 4.2, 8.7, 2), Population = c(1.22, 0.06, 0, 0.54, 
2.34, 0.74, 1.03, 1.405095932, 0.791124402, 2.746318326, 0.026149254, 
11.1252, 0.05183432, 2.992952671, 0.705447655, 0, 0.900246028, 
1.15476828, 0, 1.150673397, 1.441975309, 0, 0.713777778, 1.205504587, 
1.449230769, 0.820985507), Birth.rate = c(11.56, 146.75, 167.23, 
7, 7, 7, 10.07, 47.42900998, 20.42464115, 7.520608751, 7, 7, 
15.97633136, 15.1531143, 20.41686405, 7, 22.60379293, 7, 7, 18.55225902, 
7, 7.7, 7, 7, 7, 7), Income = c(54L, 94L, 37L, 95L, 98L, 31L, 
78L, 74L, 81L, 95L, 16L, 44L, 63L, 95L, 20L, 95L, 83L, 98L, 98L, 
84L, 62L, 98L, 98L, 97L, 98L, 57L), Savings = c(56.73, 56.49, 
42.81, 70.98, 88.24, 35.16, 46.18, 35.043, 46.521, 58.024, 22.738, 
60.244, 77.807, 80.972, 13.08, 40.985, 46.608, 63.32, 51.45, 
74.803, 73.211, 50.692, 65.532, 83.898, 60.857, 40.745)), .Names = c("GDP", "Population", "Birth.rate", "Income", "Savings"), class = "data.frame", row.names = c(NA, -26L))
4

1 回答 1

0

一些聚类方法并不直接适应您的数据集或数据类型。您可以选择最佳方法,也可以全部使用。当使用所有这些时,通常会产生错误消息(这不是错误)。通过禁用停止循环的错误消息,以下可能是替代方法:

vc.method <- c("kl","ch", "hartigan","ccc", "scott","marriot","trcovw", "tracew","friedman", "rubin", "cindex", "db", "silhouette", "duda", "beale", "ratkowsky", "ball", "ptbiserial", "pseudot2", "gap", "frey", "mcclain",  "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "alllong")
    
    val.nb <- c()
    
    for(method in 1:length(vc.method)){
      
      tryCatch({
        en.nb <- NbClust(na.omit(sum.sn), distance = "euclidean", min.nc = 2,
                       max.nc = vc.K.max, method = "kmeans", 
                       index = vc.method[method])
        
        val.nb <- c(val.nb, as.numeric(en.nb$Best.nc[1]))
        
        }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
      
    }
    
于 2020-11-22T10:17:40.943 回答