python - 遗传算法/人工智能；基本上，我在正确的轨道上吗？

Question

我知道 Python 不是编写这种性质的任何软件的最佳主意。我的理由是在 Raspberry Pi 3 的决策中使用这种类型的算法（仍然不确定会如何进行），以及我将使用的库和 API（Adafruit motor HAT、Google 服务、OpenCV、各种传感器等）都非常适合在 Python 中导入，更不用说我在这个环境中更适合 rPi 了。我已经诅咒它像 Java 或 C++ 这样的面向对象对我来说更有意义，但我宁愿处理它的低效率并专注于 rPi 集成的更大图景。

我不会在这里解释代码，因为它在整个脚本的注释部分中都有很好的记录。我的问题如上所述；这基本上可以被认为是一种遗传算法吗？如果不是，它必须是基本的人工智能或遗传密码吗？我是否在解决这类问题的正确轨道上？我知道通常有加权变量和函数来促进“适者生存”，但我认为可以根据需要弹出。

我已经阅读了很多关于这个主题的论坛和文章。我不想复制别人我几乎不理解的代码并开始使用它作为我更大项目的基础；我想确切地知道它是如何工作的，所以我不会对为什么在此过程中没有解决问题感到困惑。所以，我只是试图理解它是如何工作的基本思想，并写下我是如何解释它的。请记住，我想为此留在 Python 中。我知道 rPi 有多个 C++、Java 等环境，但如前所述，我使用的大多数硬件组件只有 Python API 用于实现。如果我错了，请在算法层面进行解释，而不仅仅是一段代码（再次，我真的很想了解这个过程）。另外，请不要挑剔代码约定，除非它' 与我的问题有关，每个人都有自己的风格，现在这只是一个草图。在这里，感谢阅读！

# Created by X3r0, 7/3/2016
# Basic genetic algorithm utilizing a two dimensional array system.
# the 'DNA' is the larger array, and the 'gene' is a smaller array as an element
# of the DNA. There exists no weighted algorithms, or statistical tracking to
# make the program more efficient yet; it is straightforwardly random and solves
# its problem randomly. At this stage, only the base element is iterated over.

# Basic Idea:
# 1) User inputs constraints onto array
# 2) Gene population is created at random given user constraints
# 3) DNA is created with randomized genes ( will never randomize after )
#   a) Target DNA is created with loop control variable as data (basically just for some target structure)
# 4) CheckDNA() starts with base gene from DNA, and will recurse until gene matches the target gene
#   a) Randomly select two genes from DNA
#   b) Create a candidate gene by splicing both parent genes together
#   c) Check candidate gene against the target gene
#   d) If there exists a match in gene elements, a child gene is created and inserted into DNA
#   e) If the child gene in DNA is not equal to target gene, recurse until it is

import random

DNAsize = 32
geneSize = 5
geneDiversity = 9
geneSplit = 4
numRecursions = 0

DNA = []
targetDNA = []

def init():
    global DNAsize, geneSize, geneDiversity, geneSplit, DNA

    print("This is a very basic form of genetic software. Input variable constraints below. "
          "Good starting points are: DNA strand size (array size): 32, gene size (sub array size: 5, gene diversity (randomized 0 - x): 5"
          "gene split (where to split gene array for splicing): 2")

    DNAsize = int(input('Enter DNA strand size: '))
    geneSize = int(input('Enter gene size: '))
    geneDiversity = int(input('Enter gene diversity: '))
    geneSplit = int(input('Enter gene split: '))

    # initializes the gene population, and kicks off
    # checkDNA recursion
    initPop()
    checkDNA(DNA[0])

def initPop():

    # builds an array of smaller arrays
    # given DNAsize
    for x in range(DNAsize):
        buildDNA()
        # builds the goal array with a recurring
        # numerical pattern, in this case just the loop
        # control variable
        buildTargetDNA(x)

def buildDNA():
    newGene = []

    # builds a smaller array (gene) using a given geneSize
    # and randomized with vaules 0 - [given geneDiversity]
    for x in range(geneSize):
        newGene.append(random.randint(0,geneDiversity))
    # append the built array to the larger array
    DNA.append(newGene)

def buildTargetDNA(x):

    # builds the target array, iterating with x as a loop
    # control from the call in init()
    newGene = []
    for y in range(geneSize):
            newGene.append(x)
    targetDNA.append(newGene)

def checkDNA(childGene):

    global numRecursions
    numRecursions = numRecursions+1

    gene = DNA[0]
    targetGene = targetDNA[0]
    parentGeneA = DNA[random.randint(0,DNAsize-1)]          # randomly selects an array (gene) from larger array (DNA)
    parentGeneB = DNA[random.randint(0,DNAsize-1)]
    pos = random.randint(geneSplit-1,geneSplit+1)           # randomly selects a position to split gene for splicing
    candidateGene = parentGeneA[:pos] + parentGeneB[pos:]   # spliced gene given split from parentA and parentB

    print("DNA Splice Position: " + str(pos))
    print("Element A: " + str(parentGeneA))
    print("Element B: " + str(parentGeneB))
    print("Candidate Element: " + str(candidateGene))
    print("Target DNA: " + str(targetDNA))
    print("Old DNA:    " + str(DNA))

    # iterates over the candidate gene and compares each element to the target gene
    # if the candidate gene element hits a target gene element, the resulting child
    # gene is created
    for x in range(geneSize):
        #if candidateGene[x] != targetGene[x]:
            #print("false ")
        if candidateGene[x] == targetGene[x]:
            #print("true ")
            childGene.pop(x)
            childGene.insert(x, candidateGene[x])

    # if the child gene isn't quite equal to the target, and recursion hasn't reached
    # a max (apparently 900), the child gene is inserted into the DNA. Recursion occurs
    # until the child gene equals the target gene, or max recursuion depth is exceeded
    if childGene != targetGene and numRecursions < 900:
        DNA.pop(0)
        DNA.insert(0, childGene)
        print("New DNA:   " + str(DNA))
        print(numRecursions)
        checkDNA(childGene)

init()
print("Final DNA:  " + str(DNA))
print("Number of generations (recursions): " + str(numRecursions))

score 5 · Accepted Answer

我现在正在研究进化计算，所以我希望我的回答对你有所帮助，就我个人而言，我使用 java，主要是因为它是我的主要语言之一，并且为了可移植性，因为我在 linux、windows 和 mac 中进行了测试. 就我而言，我使用置换编码，但如果您仍在学习 GA 的工作原理，我强烈建议您使用二进制编码。这就是我所说的 InitialPopulation。我尝试描述我的程序的工作流程：

1-。设置我的主要变量

这是 PopulationSize、IndividualSize、MutationRate、CrossoverRate。您还需要创建一个目标函数并确定您使用的交叉方法。对于这个例子，假设我的 PopulationSize 等于50， IndividualSize 是4， MutationRate 是0.04%， CrossoverRate 是90%，并且交叉方法将是轮盘赌。我的目标函数只检查我的个人是否能够以二进制表示数字 15，所以最好的个人必须是1111。

2-。初始化我的人口

为此，我创建了具有随机基因的50个体（50由我的 PopulationSize 给出）。

3-。循环开始

对于人口中的每个人，您需要：

根据目标函数评估适应度。如果一个个体由下一个字符表示：00100这意味着他的适应度为 1。如您所见，这是一个简单的适应度函数。您可以在学习时创建自己的，例如fitness = 1/numberOfOnes. 您还需要将所有适应度的总和分配给一个名为的变量populationFitness，这将在下一步中很有用。
选择最优秀的人。对于此任务，您可以使用很多方法，但我们将使用之前所说的轮盘赌方法。在这种方法中，您为人口中的每个人分配一个值。该值由下一个公式给出：(fitness/populationFitness) * 100。所以，如果你的种群适应度是 10，而某个个体适应度是 3，这意味着这个个体有 30% 的机会被选中与另一个个体进行交叉。此外，如果另一个人的健康度为 4，他的值将是 40%。
应用交叉。一旦你拥有了人口中“最好的”个体，你就需要创建一个新的人口。这个新种群是由先前种群的其他个体组成的。为每个人创建一个从 0 到 1 的随机数。如果这个数字在 0.9 的范围内（因为我们的 crossoverRate = 90%），这个人可以繁殖，所以你选择另一个人。每个新个体都有这两个继承他基因的父母。例如：让我们这么parentA = 1001说parentB = 0111. 我们需要用这个基因创造一个新的个体。有很多方法可以做到这一点，均匀交叉，单点交叉，两点交叉等。我们将使用单点交叉。在这种方法中，我们在第一个基因和最后一个基因之间选择一个随机点。然后，我们根据的第一个基因parentA和最后一个基因创建一个新个体parentB。以视觉形式：

parentA = 1001 parentB = 0111 crossoverPoint = 2 newIndividual = 1011

如您所见，新个体共享他父母的基因。

一旦有了新个体的新种群，就可以应用突变。在这种情况下，为新种群中的每个个体生成一个介于 0 和 1 之间的随机数。如果这个数字在 0.04 的范围内（自我们的mutationRate = 0.04），您将在随机基因中应用突变。在二进制编码中，突变只是将 1 更改为 0，反之亦然。以视觉形式：

individual = 1011 randomPoint = 3 mutatedIndividual = 1010

获得最好的个人
如果此人已达到解决方案停止。否则，重复循环
结尾

如您所见，我的英语不是很好，但我希望您了解遗传算法的基本思想。如果你真的有兴趣学习这个，你可以查看以下链接：

http://www.obitko.com/tutorials/genetic-algorithms/ 这个链接更清楚地解释了遗传算法的基础知识

http://natureofcode.com/book/chapter-9-the-evolution-of-code/ 这本书还解释了 GA 是什么，还提供了 Processing 中的一些代码，基本上是 java。但我想你可以理解。

另外我会推荐以下书籍：

遗传算法简介 - Melanie Mitchell

理论与实践中的进化算法 - Thomas Bäck

遗传算法简介 - SN Sivanandam

如果您没有钱，您可以轻松找到所有 PDF 格式的书籍。此外，您可以随时在scholar.google.com 中搜索文章，几乎所有文章都可以免费下载。

score 1 · Accepted Answer

只是为了给 Alberto 的出色答案添加一点内容，随着解决方案的发展，您需要注意两个问题。

第一个是过拟合。这基本上意味着您的解决方案足够复杂，可以“学习”所有样本，但它不适用于训练集之外。为避免这种情况，您需要确保训练集中的信息“量”远大于适合您的解决方案的信息量。

第二个问题是高原。在某些情况下，您会得出某些平庸的解决方案，但仍然足以“胜过”任何新兴的解决方案，因此您的进度停滞不前（一种看待这种情况的方法是，如果您看到自己的健康状况“停滞不前”，那么比最佳数量）。解决此问题的一种方法是灭绝：您可以跟踪最佳解决方案的改进率，如果过去 N 代的改进为 0，您只需核对您的人口。（也就是说，删除您的人口和最佳个体列表并重新开始）。随机性将使解决方案最终超越高原。

要记住的另一件事是，默认的 Random 类在 Randomness 方面非常糟糕。通过简单地使用 Mesernne Twister Random generator 或 Hardware Entropy Generator 之类的东西，我的解决方案得到了显着改善。

我希望这有帮助。祝你好运。

python - 遗传算法/人工智能；基本上，我在正确的轨道上吗？

2 回答 2

1-。设置我的主要变量

2-。初始化我的人口

3-。循环开始

Related

Reference