python - python中使用差分进化的分段回归

Question

我的长期目标是为特定数据集创建一个模块，将分段回归拟合到任意数量的断点，以及标准多项式和线性曲线拟合，然后评估哪种拟合最适用于数据（可能使用 AIC 或 BIC）。

我有一个函数，它使用差分进化在 x 和 y 数据集上使用分段回归，假设有 1 个断点：

def segReg_one(xData,yData):

    def func(xVals,model_break,slopeA,slopeB,offsetA,offsetB): #Initialization of the piecewise function
        returnArray=[]
        for x in xVals:
            if x > model_break:
                returnArray.append(slopeA * x + offsetA)
            else:
                returnArray.append(slopeB * x + offsetB)


        return returnArray

    def sumSquaredError(parametersTuple): #Definition of an error function to minimize
        modely=func(xData,*parametersTuple)
        warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm

        return np.sum((yData-modely)**2.0)

    def generate_genetic_Parameters():
        initial_parameters=[]
        x_max=np.max(xData)
        x_min=np.min(xData)
        y_max=np.max(yData)
        y_min=np.min(yData)
        slope=10*(y_max-y_min)/(x_max-x_min)

        initial_parameters.append([x_max,x_min]) #Bounds for model break point
        initial_parameters.append([-slope,slope]) #Bounds for slopeA
        initial_parameters.append([-slope,slope]) #Bounds for slopeB
        initial_parameters.append([y_max,y_min]) #Bounds for offset A
        initial_parameters.append([y_max,y_min]) #Bounds for offset B

        result=differential_evolution(sumSquaredError,initial_parameters,seed=3)

        return result.x

    geneticParameters = generate_genetic_Parameters() #Generates genetic parameters



    fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data 
    print('Parameters:', fittedParameters)
    print('Model break at: ', fittedParameters[0])
    print('Slope of line where x < model break: ', fittedParameters[1])
    print('Slope of line where x > model break: ', fittedParameters[2])
    print('Offset of line where x < model break: ', fittedParameters[3])
    print('Offset of line where x > model break: ', fittedParameters[4])





    model=func(xData,*fittedParameters)

    absError = model - yData

    SE = np.square(absError) 
    MSE = np.mean(SE) 
    RMSE = np.sqrt(MSE) 
    Rsquared = 1.0 - (np.var(absError) / np.var(yData))

    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)



    def ModelAndScatterPlot(graphWidth, graphHeight):
            f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
            axes = f.add_subplot(111)


            axes.plot(xData, yData,  'D')

            xModel = np.linspace(min(xData), max(xData))
            yModel = func(xModel, *fittedParameters)

            axes.plot(xModel, yModel)

            axes.set_xlabel('X Data') # X axis data label
            axes.set_ylabel('Y Data') # Y axis data label

            plt.show()
            plt.close('all') 


    graphWidth = 800
    graphHeight = 600

    return ModelAndScatterPlot(800,600)

运行良好。但是，我尝试扩展模型以允许超过 1 个断点：

def segReg_two(xData,yData):

    def func(xData,break1,break2,slope1,slope_mid,slope2,offset1,offset_mid,offset2):
        returnArray=[]
        for x in xData:
            if x < break1:
                returnArray.append(slope1 * x + offset1)

            if (x < break2 and x > break1):
                returnArray.append(slope_mid * x + offset_mid)

            else:
                returnArray.append(slope2 * x + offset2)


    def sumSquaredError(parametersTuple): #Definition of an error function to minimize
        modely=func(xData,*parametersTuple)
        warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm

        return np.sum((yData-modely)**2.0)

    def generate_genetic_Parameters():
        initial_parameters=[]
        x_max=np.max(xData)
        x_min=np.min(xData)
        y_max=np.max(yData)
        y_min=np.min(yData)
        slope=10*(y_max-y_min)/(x_max-x_min)

        initial_parameters.append([x_max,x_min]) #Bounds for model break point
        initial_parameters.append([x_max,x_min])
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([y_max,y_min])
        initial_parameters.append([y_max,y_min]) 
        initial_parameters.append([y_max,y_min]) 


        result=differential_evolution(sumSquaredError,initial_parameters,seed=3)

        return result.x

    geneticParameters = generate_genetic_Parameters() #Generates genetic parameters



    fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data 
    print('Parameters:', fittedParameters)
    print('Model break at: ', fittedParameters[0])
    print('Slope of line where x < model break: ', fittedParameters[1])
    print('Slope of line where x > model break: ', fittedParameters[2])
    print('Offset of line where x < model break: ', fittedParameters[3])
    print('Offset of line where x > model break: ', fittedParameters[4])





    model=func(xData,*fittedParameters)

    absError = model - yData

    SE = np.square(absError) 
    MSE = np.mean(SE) 
    RMSE = np.sqrt(MSE) 
    Rsquared = 1.0 - (np.var(absError) / np.var(yData))

    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)

    def ModelAndScatterPlot(graphWidth, graphHeight):
            f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
            axes = f.add_subplot(111)


            axes.plot(xData, yData,  'D')

            xModel = np.linspace(min(xData), max(xData))
            yModel = func(xModel, *fittedParameters)

            axes.plot(xModel, yModel)

            axes.set_xlabel('X Data') # X axis data label
            axes.set_ylabel('Y Data') # Y axis data label

            plt.show()
            plt.close('all') 


    graphWidth = 800
    graphHeight = 600

    return ModelAndScatterPlot(800,600)

当我运行时，这段代码遇到了问题segReg_two(x,y)，停在了differential_evolution位：

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType' 在处理上述异常期间，发生了另一个异常：

RuntimeError：类似地图的可调用对象必须是 f(func, iterable) 的形式，返回与“可迭代”长度相同的数字序列

我没有这个问题segReg_one，所以我不明白为什么会在这里发生。我假设（我可能对这个假设不正确）参数iterable必须具有与我的误差函数兼容的维度。但是，我不确定这两个参数是如何准确关联的，除了我正在找到断点、斜率和偏移量，这些断点、斜率和偏移量可以在给定边界的情况下最小化断点。

此外，我的进攻计划似乎非常冗长和野蛮。有没有更好的方法来解决这个问题？

我想也许它正在将我的分段函数视为无类型。打印带有一些随机值的函数仅返回“无”。然而，我的分段函数打印同样的东西，它仍然工作得很好。

score 0 · Accepted Answer

如果您不依赖于使用差分进化，那么分段回归包使用迭代算法将分段模型拟合到数据。它有一个基于贝叶斯信息准则的模型比较工具。

这是从 2 个断点生成的一些数据

x = [0.0, 0.3, 0.5, 0.8, 1.0, 1.3, 1.5, 1.8, 2.0, 2.3, 2.5, 2.8, 3.0, 3.3, 3.5, 3.8, 4.0, 4.3, 4.5, 4.8, 5.1, 5.3, 5.6, 5.8, 6.1, 6.3, 6.6, 6.8, 7.1, 7.3, 7.6, 7.8, 8.1, 8.3, 8.6, 8.8, 9.1, 9.3, 9.6, 9.8, 10.1, 10.4, 10.6, 10.9, 11.1, 11.4, 11.6, 11.9, 12.1, 12.4, 12.6, 12.9, 13.1, 13.4, 13.6, 13.9, 14.1, 14.4, 14.6, 14.9, 15.2, 15.4, 15.7, 15.9, 16.2, 16.4, 16.7, 16.9, 17.2, 17.4, 17.7, 17.9, 18.2, 18.4, 18.7, 18.9, 19.2, 19.4, 19.7, 19.9, 20.2, 20.5, 20.7, 21.0, 21.2, 21.5, 21.7, 22.0, 22.2, 22.5, 22.7, 23.0, 23.2, 23.5, 23.7, 24.0, 24.2, 24.5, 24.7, 25.0]

y = [16.2, -5.5, -4.0, -8.8, 11.2, -19.9, 21.2, -3.2, 8.2, 3.2, 20.9, -13.7, 4.4, 4.4, 20.2, -1.5, 8.4, 2.0, 11.8, 17.8, 1.6, 24.7, 22.9, 19.5, 24.7, 11.9, 20.6, 15.5, 25.2, 36.2, 27.0, 33.0, 33.1, 34.5, 39.3, 48.9, 40.9, 57.5, 74.7, 68.6, 62.3, 58.4, 62.8, 90.2, 76.8, 73.0, 84.3, 106.4, 89.7, 97.7, 97.5, 94.0, 89.2, 100.1, 104.5, 115.5, 121.1, 125.0, 121.6, 130.6, 115.8, 136.3, 129.4, 121.8, 130.2, 125.1, 137.6, 142.0, 149.2, 113.9, 113.9, 123.8, 131.0, 138.6, 133.5, 110.7, 128.3, 140.2, 134.7, 140.5, 131.2, 131.9, 136.3, 139.0, 137.4, 137.1, 129.7, 140.7, 138.7, 149.2, 150.4, 140.8, 135.7, 133.6, 144.7, 141.8, 138.0, 142.4, 136.3, 150.0]

使用分段回归的模型比较工具：

import piecewise_regression

piecewise_regression.ModelSelection(xx, yy)

这表明基于 BIC 的模型具有 2 个断点。

我们还可以绘制带有 2 个断点的拟合：

pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=2)

pw_fit.plot()

python - python中使用差分进化的分段回归

1 回答 1

Related

Reference