我正在使用GA 包来最小化一个功能。以下是我实施的几个阶段。
0. 库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
dataset2 <- data_ratio
1. 二进制编码并生成初始种群。
initial_population <- function(object) {
## generate a population where for each individual, there will be number of 1's fixed between three to six
population <- t(replicate(object@popSize, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, object@nBits - i)))}))
2. 适应度函数最小化 Davies-Bouldin (DB) 指数。
DBI2 <- function(x) {
## number of 1's will represent the initial selected centroids and hence the number of clusters
cl <- kmeans(dataset2, dataset2[x == 1, ])
dbi <- index.DB(dataset2, cl=cl$cluster, centrotypes = "centroids")
score <- -dbi$DB
3. 用户定义的交叉算子。这种交叉方法将避免没有集群“打开”的情况。伪代码可以在这里找到。
pairwise_crossover <- function(object, parents){
fitness <- object@fitness[parents]
parents <- object@population[parents, , drop = FALSE]
n <- ncol(parents)
children <- matrix(as.double(NA), nrow = 2, ncol = n)
fitnessChildren <- rep(NA, 2)
## finding the min no. of 1's between 2 parents
m <- min(sum(parents[1, ] == 1), sum(parents[2, ] == 1))
## generate a random int from range(1,m)
random_int <- sample(1:m, 1)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_a <- sample(1:length(parents[1, ]), random_int)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_b <- sample(1:length(parents[2, ]), random_int)
## union them
all <- sort(union(random_a, random_b))
## determine the union positions
temp_a <- parents[1, ][all]
temp_b <- parents[2, ][all]
## crossover
parents[1, ][all] <- temp_b
children[1, ] <- parents[1, ]
parents[2, ][all] <- temp_a
children[2, ] <- parents[2, ]
out <- list(children = children, fitness = fitnessChildren)
4. 突变。
k_min <- 2
k_max <- ceiling(sqrt(75))
my_mutation <- function(object, parent){
pop <- parent <- as.vector(object@population[parent, ])
for(i in 1:length(pop)){
if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1)) {
pop[i] <- abs(pop[i] - 1)
5. 把碎片放在一起。使用轮盘赌选择,交叉概率。= 0.8,突变概率。= 0.1
g2<- ga(type = "binary",
population = initial_population,
fitness = DBI2,
selection = ga_rwSelection,
crossover = pairwise_crossover,
mutation = my_mutation,
pcrossover = 0.8,
pmutation = 0.1,
popSize = 100,
nBits = nrow(dataset2))
来自初始种群的 +-=1,即,如果一个个体1's
的染色体中有 3 个,它最终将随机具有 2 个、3 个或 4 个1's
。但我得到了这个解决方案,它显示 12 个集群 ( 1's
) 正在“打开”,这意味着交叉和变异算子运行良好。
> sum(g2@solution==1)
[1] 12
通过复制所有代码可以重现这里的问题。任何熟悉 GA 包的人都可以在这里帮助我吗?
0. 库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
## removed last column since it is a categorical data
dataset2 <- iris[-5]
> Error in kmeans(dataset2, centers = dataset2[x == 1, ]) :
initial centers are not distinct
. 这可能意味着什么?