r - Finding a curve to match data

Question

I'm looking for a non-linear curve fitting routine (probably most likely to be found in R or Python, but I'm open to other languages) which would take x,y data and fit a curve to it.

I should be able to specify as a string the type of expression I want to fit.

Examples:

"A+B*x+C*x*x"
"(A+B*x+C*x*x)/(D*x+E*x*x)"
"sin(A+B*x)*exp(C+D*x)+E+F*x"

What I would get out of this is at least the values for the constants (A, B, C, etc.) And hopefully stats about the fitness of the match.

There are commercial programs to do this, but I expected to be able to find something as common as fitting to a desired expression in a language library nowadays. I suspect SciPy's optimization stuff might be able to do this, but I can't see that it lets me define an equation. Likewise, I can't seem to find exactly what I want in R.

Is what I'm looking for out there, or do I need to roll my own? I hate to do it if it's there and I'm just having trouble finding it.

Edit: I want to do this for a bit more control over the process than I get from LAB Fit. The LAB Fit UI is dreadful. I'd also like to be able to break the range into multiple pieces and have different curves represent the different pieces of the range. In the end, the result has to be able to (speed-wise) beat a LUT with linear interpolation or I'm not interested.

In my current set of problems, I have trig functions or exp() and I need to execute them 352,800 times per second in real time (and use only a fraction of the CPU). So I plot the curve and use the data to drive the curve fitter to get less expensive approximations. In the old days, LUTs were almost always the solution, but nowadays skipping the memory lookups and doing an approximation is sometimes faster.

score 8 · Accepted Answer

您的第一个模型实际上在三个参数中是线性的，可以使用 R 拟合

 fit <- lm(y ~ x + I(x^2), data=X)

这将为您提供三个参数。

第二个模型也可以nls()在 R 中使用，但通常需要提供起始值等警告。优化中的统计问题不一定与数值问题相同——无论您使用哪种语言，都不能只优化任何函数形式选择。

score 8 · Accepted Answer

要从一般意义上回答您的问题（关于 R 中的参数估计）而不考虑您指出的方程的细节，我认为您正在寻找 nls() 或 optim()...'nls' 是我的第一选择它为每个估计参数提供误差估计，当它失败时我使用“优化”。如果您有 x,y 变量：

out <- tryCatch(nls( y ~ A+B*x+C*x*x, data = data.frame(x,y), 
                start = c(A=0,B=1,C=1) ) ,
                error=function(e) 
                optim( c(A=0,B=1,C=1), function(p,x,y)  
                      sum((y-with(as.list(p),A + B*x + C*x^2))^2), x=x, y=y) )

得到系数，比如

getcoef <- function(x) if(class(x)=="nls") coef(x) else x$par
getcoef(out)

如果你想要'nls'的标准错误，

summary(out)$parameters

帮助文件和 r-help 邮件列表帖子包含许多关于每个实现的特定最小化算法的讨论（在上面的每个示例案例中使用的默认值）以及它们对手头等式的特定形式的适用性。某些算法可以处理框约束，而另一个名为 constrOptim() 的函数将处理一组线性约束。该网站还可以帮助：

http://cran.r-project.org/web/views/Optimization.html

score 1 · Accepted Answer

查看GNU Octave - 在它的 polyfit() 和非线性约束求解器之间，它应该可以构建适合您的问题的东西。

score 1 · Accepted Answer

您可能不会找到具有示例中隐含的灵活性的单个例程（使用相同例程的多项式和有理函数），更不用说解析字符串以找出适合哪种方程的例程了。

最小二乘多项式拟合器适用于您的第一个示例。（这取决于你使用什么次数的多项式——二次、三次、四次等）。对于像第二个示例这样的理性函数，如果找不到合适的库，您可能必须“自己动手”。另外，请记住，只要您不需要推断超出您要拟合的数据集的限制，就可以使用足够高的多项式来近似您的“真实”函数。

正如其他人所指出的，还有其他更通用的参数估计算法也可能被证明是有用的。但这些算法并不是完全“即插即用”：它们通常需要您编写一些辅助例程，并提供模型参数的初始值列表。这些类型的算法可能会发散，或者陷入局部最小值或最大值，因为初始参数估计的选择很不幸。

score 1 · Accepted Answer

如果您对系数有限制，并且您知道有一种特定类型的函数要适合您的数据，并且该函数是一个混乱的函数，其中标准回归方法或其他曲线拟合方法不起作用，有你考虑过遗传算法吗？

它们不是我的第一选择，但如果你试图找到你提到的第二个函数的系数，那么也许 GA 会起作用——特别是如果你使用非标准指标来评估最佳拟合。例如，如果您想找到“(A+Bx+Cx^2)/(Dx+Ex^2)”的系数，使得您的函数和数据之间的平方差之和最小并且存在一些约束在结果函数的弧长上，随机算法可能是解决这个问题的好方法。

一些注意事项：1）随机算法不能保证最佳解决方案，但它们通常会非常接近。2）你必须小心算法的稳定性。

更长一点，如果您正处于希望从最适合您的数据的某些函数空间中找到一个函数的阶段（例如，您不会将第二个模型强加于您的数据），那么遗传编程技术也可能有所帮助。

score 1 · Accepted Answer

在 R 中，这很容易。

内置方法称为 optim()。它将一个潜在参数的起始向量作为参数，然后是一个函数。你必须去构建自己的错误函数，但这真的很简单。

然后你把它叫做 out = optim( 1 , err_fn)

其中 err_fn 是

err_fn = function(A) {
    diff = 0;
    for(i in 1:data_length){
      x = eckses[i];
      y = data[i];
      model_y = A*x;
      diff = diff + ( y - model_y )^2
    }
    return(diff);
}

这只是假设您在 eckses 和数据中有一个 x 和 y 值的向量。根据需要更改 model_y 行，甚至添加更多参数。

它在非线性上工作得很好，我将它用于四维 e^x 曲线，它非常快。输出数据包括拟合结束时的误差值，这是对拟合程度的度量，以平方差之和的形式给出（在我的 err_fn 中）。

编辑：如果您需要将模型作为字符串接收，您可以让您的用户界面将整个模型拟合过程构建为 R 脚本并将其加载到运行中。R 可以从 STDIN 或文件中获取文本，因此制作此函数的等效字符串并让它自动运行优化应该不会太难。

r - Finding a curve to match data

6 回答 6

Related

Reference