python - 用数千个参数加速差分进化算法

Question

我正在尝试在 python 中创建一个具有很多参数（从 37 到 1099）的集中降雨径流平衡模型。作为输入，它将接收每日降雨和温度数据，然后作为每日流量提供输出。

我被困在模型校准的优化方法上。我之所以选择差分进化算法，是因为它易于使用，可以应用于这类问题。我写的算法效果很好，它似乎最小化了目标函数（这是 Nash-Sutcliff 模型效率 - NSE）。问题始于更多数量的参数，这显着减慢了整个算法。我写的DE算法：

import numpy as np
import flow    # a python file from where I get observed daily flows as a np.array

def differential_evolution(func, bounds, popsize=10, mutate=0.8, CR=0.85, maxiter=50): 

    #--- INITIALIZE THE FIRST POPULATION WITHIN THE BOUNDS-------------------+

    bounds = [(0, 250)] * 1 + [(0, 5)] * 366 + [(0, 2)] * 366 + [(0, 100)] * 366
    dim = len(bounds)
    pop_norm = np.random.rand(popsize, dim)
    min_bound, max_bound = np.asarray(bounds).T
    difference = np.fabs(min_bound - max_bound)
    population = min_bound + pop_norm * difference

    # Computed value of objective function for intial population

    fitness = np.asarray([func(x, flow.l_flow) for x in population])
    best_idx = np.argmin(fitness)
    best = population[best_idx]  

    #--- MUTATION -----------------------------------------------------------+
    
    # This is the part which take to much time to complete
    for i in range(maxiter):
        print('Generation: ', i)
        for j in range(popsize):

            # Random selection of three individuals to make a noice vector
            idxs = list(range(0, popsize))    
            idxs.remove(j)              
            x_1, x_2, x_3 = pop_norm[np.random.choice(idxs, 3, replace=True)]
            noice_vector = np.clip(x_1 + mutate * (x_2 - x_3), 0, 1) 

    #--- RECOMBINATION ------------------------------------------------------+  

            cross_points = np.random.rand(dim) < CR
            if not np.any(cross_points):
                cross_points[np.random.randint(0, dim)] = True

            trial_vector_norm = np.where(cross_points, noice_vector, pop_norm[j])
            trial_vector = min_bound + trial_vector_norm * difference
            crit = func(trial_vector, flow.l_flow)
            
            # Check for better fitness of objective function
            if crit < fitness[j]:
                fitness[j] = crit
                pop_norm[j] = trial_vector_norm
                if crit < fitness[best_idx]:
                    best_idx = j
                    best = trial_vector
    return best, fitness[best_idx]

降雨径流模型本身是一个函数，它基本上适用于列表，并通过 for 循环迭代每一行以通过简单的方程计算每日流量。目标函数 NSE 由 numpy 数组向量化：

import model # a python file where rainfall-runoff model function is defined 

def nse_min(parameters, observations):
    
    # Modeled flows from model function
    Q_modeled = np.array(model.model(parameters))

    # Computation of the NSE fraction
    numerator = np.subtract(observations, Q_modeled) ** 2
    denominator = np.subtract(observations, np.sum(observations)/len(observations)) ** 2
    return np.sum(numerator) / np.sum(denominator)

有没有加快速度的机会？我发现了 numba 库，它“将 python 代码编译为机器代码”，然后让您更有效地在 CPU 上或使用 CUDA 内核的 GPU 上进行计算。但是我没有研究任何与 IT 相关的东西，也不知道 CPU/GPU 是如何工作的，因此我不知道如何正确使用 numba。有人可以帮我吗？或者任何人都可以提出不同的优化方法吗？

我使用的是：Python 3.7.0 64 位、Windows 10 Home x64、Intel Core(TM) i7-7700HQ CPU @ 2.80 Ghz、NVIDIA GeForce GTX 1050 Ti 4GB GDDR5、16 GB RAM DDR4。

我是一名 Python 初学者，学习水管理，有时只是将 Python 用于一些让我在数据处理中的生活更轻松的脚本。提前谢谢你的帮助。

score 0 · Accepted Answer

您可以使用 python 库多处理。它只是让更多的进程来运行你的函数。你可以像这样使用它。

from multiprocessing import Process

def f(name):
    print('hello', name)

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

python - 用数千个参数加速差分进化算法

1 回答 1

Related

Reference