我想NbClust()
为几个数据框调用该函数。我通过包含NbClust()
函数调用的 for 循环“发送”它们来做到这一点。代码如下所示:
#combos of just all columns from df
variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE)
for(i in 1:length(variations)){
df = data.frame(variations[i])
nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete")
}
不幸的是,它总是会产生以下错误。奇怪的是,如果我在没有循环的情况下应用相同的函数调用(即仅对一个数据帧),它可以完美地工作......那么有什么问题?
我查看了源代码,NbClust
确实有一行包含错误消息的代码,但我无法相应地更改代码。你知道问题可能是什么吗?
if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + : 需要 TRUE/FALSE 的缺失值
此外,它还会产生以下警告:
In addition: Warning messages:
1: In max(DiffLev[, 5], na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
2: In matrix(c(results), nrow = 2, ncol = 26) :
data length [51] is not a sub-multiple or multiple of the number of rows [2]
3: In matrix(c(results), nrow = 2, ncol = 26, dimnames = list(c("Number_clusters", :
data length [51] is not a sub-multiple or multiple of the number of rows [2]
数据如下:
df = structure(list(GDP = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6, 17, 4.9,
7.9, 2, 14.2, 48.2, 17.1, 10.4, 37.5, 1.6, 49.5, 10.8, 6.2, 7.1,
7.8, 3, 3.7, 4.2, 8.7, 2), Population = c(1.22, 0.06, 0, 0.54,
2.34, 0.74, 1.03, 1.405095932, 0.791124402, 2.746318326, 0.026149254,
11.1252, 0.05183432, 2.992952671, 0.705447655, 0, 0.900246028,
1.15476828, 0, 1.150673397, 1.441975309, 0, 0.713777778, 1.205504587,
1.449230769, 0.820985507), Birth.rate = c(11.56, 146.75, 167.23,
7, 7, 7, 10.07, 47.42900998, 20.42464115, 7.520608751, 7, 7,
15.97633136, 15.1531143, 20.41686405, 7, 22.60379293, 7, 7, 18.55225902,
7, 7.7, 7, 7, 7, 7), Income = c(54L, 94L, 37L, 95L, 98L, 31L,
78L, 74L, 81L, 95L, 16L, 44L, 63L, 95L, 20L, 95L, 83L, 98L, 98L,
84L, 62L, 98L, 98L, 97L, 98L, 57L), Savings = c(56.73, 56.49,
42.81, 70.98, 88.24, 35.16, 46.18, 35.043, 46.521, 58.024, 22.738,
60.244, 77.807, 80.972, 13.08, 40.985, 46.608, 63.32, 51.45,
74.803, 73.211, 50.692, 65.532, 83.898, 60.857, 40.745)), .Names = c("GDP", "Population", "Birth.rate", "Income", "Savings"), class = "data.frame", row.names = c(NA, -26L))