我正在使用GA 包来最小化一个功能。以下是我实施的几个阶段。
0. 库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
data("data_ratio")
dataset2 <- data_ratio
set.seed(555)
1. 二进制编码并生成初始种群。
initial_population <- function(object) {
## generate a population where for each individual, there will be number of 1's fixed between three to six
population <- t(replicate(object@popSize, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, object@nBits - i)))}))
return(population)
}
2. 适应度函数最小化 Davies-Bouldin (DB) 指数。
DBI2 <- function(x) {
## number of 1's will represent the initial selected centroids and hence the number of clusters
cl <- kmeans(dataset2, dataset2[x == 1, ])
dbi <- index.DB(dataset2, cl=cl$cluster, centrotypes = "centroids")
score <- -dbi$DB
return(score)
}
3. 用户定义的交叉算子。这种交叉方法将避免没有集群“打开”的情况。伪代码可以在这里找到。
pairwise_crossover <- function(object, parents){
fitness <- object@fitness[parents]
parents <- object@population[parents, , drop = FALSE]
n <- ncol(parents)
children <- matrix(as.double(NA), nrow = 2, ncol = n)
fitnessChildren <- rep(NA, 2)
## finding the min no. of 1's between 2 parents
m <- min(sum(parents[1, ] == 1), sum(parents[2, ] == 1))
## generate a random int from range(1,m)
random_int <- sample(1:m, 1)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_a <- sample(1:length(parents[1, ]), random_int)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_b <- sample(1:length(parents[2, ]), random_int)
## union them
all <- sort(union(random_a, random_b))
## determine the union positions
temp_a <- parents[1, ][all]
temp_b <- parents[2, ][all]
## crossover
parents[1, ][all] <- temp_b
children[1, ] <- parents[1, ]
parents[2, ][all] <- temp_a
children[2, ] <- parents[2, ]
out <- list(children = children, fitness = fitnessChildren)
return(out)
}
4. 突变。
k_min <- 2
k_max <- ceiling(sqrt(75))
my_mutation <- function(object, parent){
pop <- parent <- as.vector(object@population[parent, ])
for(i in 1:length(pop)){
if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1)) {
pop[i] <- abs(pop[i] - 1)
return(pop)
}
}
}
5. 把碎片放在一起。使用轮盘赌选择,交叉概率。= 0.8,突变概率。= 0.1
g2<- ga(type = "binary",
population = initial_population,
fitness = DBI2,
selection = ga_rwSelection,
crossover = pairwise_crossover,
mutation = my_mutation,
pcrossover = 0.8,
pmutation = 0.1,
popSize = 100,
nBits = nrow(dataset2))
我以这样一种方式创建了我的初始人口,即对于人口中的每个人,将有1's
固定的数量在三到六之间。交叉和变异算子旨在确保解决方案最终不会有太多集群(1's
)被“打开”。在集成它们之前,我已经分别尝试了我的交叉和突变功能,它们似乎工作正常。
理想情况下,最终解决方案的数量将1's
来自初始种群的 +-=1,即,如果一个个体1's
的染色体中有 3 个,它最终将随机具有 2 个、3 个或 4 个1's
。但我得到了这个解决方案,它显示 12 个集群 ( 1's
) 正在“打开”,这意味着交叉和变异算子运行良好。
> sum(g2@solution==1)
[1] 12
通过复制所有代码可以重现这里的问题。任何熟悉 GA 包的人都可以在这里帮助我吗?
[编辑]
尝试使用不同的数据集iris
,让我陷入以下错误。(仅更改数据,其余设置保持不变)
0. 库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
## removed last column since it is a categorical data
dataset2 <- iris[-5]
set.seed(555)
> Error in kmeans(dataset2, centers = dataset2[x == 1, ]) :
initial centers are not distinct
我尝试查看代码,发现此错误是由if(any(duplicated(centers)))
. 这可能意味着什么?