r - 遗传算法 - 交叉和突变无法正常工作

Question

我正在使用GA 包来最小化一个功能。以下是我实施的几个阶段。

0. 库和数据集

library(clusterSim)      ## for index.DB()
library(GA)              ## for ga() 
data("data_ratio")
dataset2 <- data_ratio
set.seed(555)

1. 二进制编码并生成初始种群。

initial_population <- function(object) {
    ## generate a population where for each individual, there will be number of 1's fixed between three to six
    population <- t(replicate(object@popSize, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, object@nBits - i)))}))
    return(population)
}

2. 适应度函数最小化 Davies-Bouldin (DB) 指数。

DBI2 <- function(x) {
    ## number of 1's will represent the initial selected centroids and hence the number of clusters
    cl <- kmeans(dataset2, dataset2[x == 1, ])
    dbi <- index.DB(dataset2, cl=cl$cluster, centrotypes = "centroids")
    score <- -dbi$DB

    return(score)
}

3. 用户定义的交叉算子。这种交叉方法将避免没有集群“打开”的情况。伪代码可以在这里找到。

pairwise_crossover <- function(object, parents){
    fitness <- object@fitness[parents]
    parents <- object@population[parents, , drop = FALSE]
    n <- ncol(parents)
    children <- matrix(as.double(NA), nrow = 2, ncol = n)
    fitnessChildren <- rep(NA, 2)

    ## finding the min no. of 1's between 2 parents
    m <- min(sum(parents[1, ] == 1), sum(parents[2, ] == 1))
    ## generate a random int from range(1,m)
    random_int <- sample(1:m, 1)
    ## randomly select 'random_int' gene positions with 1's in parent[1, ]
    random_a <- sample(1:length(parents[1, ]), random_int)
    ## randomly select 'random_int' gene positions with 1's in parent[1, ]
    random_b <- sample(1:length(parents[2, ]), random_int)
    ## union them
    all <- sort(union(random_a, random_b))
    ## determine the union positions
    temp_a <- parents[1, ][all]
    temp_b <- parents[2, ][all]

    ## crossover
    parents[1, ][all] <- temp_b
    children[1, ] <- parents[1, ]
    parents[2, ][all] <- temp_a
    children[2, ] <- parents[2, ]

    out <- list(children = children, fitness = fitnessChildren)
    return(out)
}

4. 突变。

k_min <- 2
k_max <- ceiling(sqrt(75))

my_mutation <- function(object, parent){
    pop <- parent <- as.vector(object@population[parent, ])
    for(i in 1:length(pop)){
            if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1)) {
                    pop[i] <- abs(pop[i] - 1)
                    return(pop) 
            }

    }

}

5. 把碎片放在一起。使用轮盘赌选择，交叉概率。= 0.8，突变概率。= 0.1

g2<- ga(type = "binary", 
    population = initial_population, 
    fitness = DBI2, 
    selection = ga_rwSelection,
    crossover = pairwise_crossover,
    mutation = my_mutation,
    pcrossover = 0.8,
    pmutation = 0.1,
    popSize = 100, 
    nBits = nrow(dataset2))

我以这样一种方式创建了我的初始人口，即对于人口中的每个人，将有1's固定的数量在三到六之间。交叉和变异算子旨在确保解决方案最终不会有太多集群（1's）被“打开”。在集成它们之前，我已经分别尝试了我的交叉和突变功能，它们似乎工作正常。

理想情况下，最终解决方案的数量将1's来自初始种群的 +-=1，即，如果一个个体1's的染色体中有 3 个，它最终将随机具有 2 个、3 个或 4 个1's。但我得到了这个解决方案，它显示 12 个集群 ( 1's) 正在“打开”，这意味着交叉和变异算子运行良好。

> sum(g2@solution==1)
[1] 12

通过复制所有代码可以重现这里的问题。任何熟悉 GA 包的人都可以在这里帮助我吗？

[编辑]

尝试使用不同的数据集iris，让我陷入以下错误。（仅更改数据，其余设置保持不变）

0. 库和数据集

library(clusterSim)      ## for index.DB()
library(GA)              ## for ga() 
## removed last column since it is a categorical data
dataset2 <- iris[-5]
set.seed(555)

> Error in kmeans(dataset2, centers = dataset2[x == 1, ]) : 
  initial centers are not distinct

我尝试查看代码，发现此错误是由if(any(duplicated(centers))). 这可能意味着什么？

score 2 · Accepted Answer

有几点值得一提：

在crossover中，为了随机选择带有 1 的“random_int”基因位置，parent[1, ]您将以下代码行从

random_a <- sample(1:length(parents[1, ]), random_int)

至

random_a <- sample(which(parents[1, ]==1), random_int)

对于另一个父母也是如此。

但是，我认为这种交叉策略可以保证任何后代最多可以打开簇位的总数作为其父母的 1 位的最大数量（在这种情况下可以是初始种群中的 6，不应该是 4如果您只想要解决方案基因的 1 位差异？）。

下图显示了3个随机选择的位置，其中至少一个父基因有1位，而交叉和后代产生。

在mutation函数中，我认为，更明确地说，我们应该更改这行代码

if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1))

经过

if((sum(pop == 1) < k_max && pop[i] == 0) | (sum(pop == 1) > k_min && pop[i] == 1))

带有适当的括号。
fitness此外，您的功能（测量集群分离）似乎Davies-Bouldin's index有利于打开更多集群。

最后我认为这是mutation罪魁祸首，如果您更改k_max为低值（例如，3）和pmutation低值（例如，pmutation = 0.01），您会在最终解决方案中发现所有基因都打开了 4 位。

[编辑]

set.seed(1234)
k_min = 2
k_max = 3 #ceiling(sqrt(75))
#5. Putting the pieces together. Using roulette-wheel selection, crossover prob. = 0.8, mutation prob. = 0.1
g2<- ga(type = "binary", 
        population = initial_population, 
        fitness = DBI2, 
        selection = ga_rwSelection,
        crossover = pairwise_crossover,
        mutation = my_mutation,
        pcrossover = 0.8,
        pmutation = 0.01,
        popSize = 100, 
        nBits = nrow(dataset2))    

g2@solution # there are 6 solution genes
    x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30 x31 x32 x33 x34 x35 x36 x37
[1,]  0  0  0  0  0  0  1  0  1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]  0  0  0  0  0  0  1  0  0   0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[3,]  0  0  0  0  0  0  1  0  0   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[4,]  0  0  0  0  0  0  1  1  0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[5,]  0  0  0  1  0  0  0  0  0   0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[6,]  0  0  0  1  0  0  0  0  0   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
     x38 x39 x40 x41 x42 x43 x44 x45 x46 x47 x48 x49 x50 x51 x52 x53 x54 x55 x56 x57 x58 x59 x60 x61 x62 x63 x64 x65 x66 x67 x68 x69 x70 x71 x72
[1,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[3,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[4,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[5,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[6,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
     x73 x74 x75
[1,]   0   0   0
[2,]   0   0   0
[3,]   0   0   0
[4,]   0   0   0
[5,]   0   0   0
[6,]   0   0   0

rowSums(g2@solution) # all of them have 4 bits on
#[1] 4 4 4 4 4 4

[编辑2]

实际上，该crossover策略保证了结合双亲时不会打开额外的位，即子代中 1 的总数 = 父代中 1 的总数，但是任何单个子代都可以打开更多位。下面显示了单个后代可以比任何父母打开更多位的示例：

r - 遗传算法 - 交叉和突变无法正常工作

1 回答 1

Related

Reference