math - 曲线拟合：找到满足约束列表的最平滑函数

Question

考虑从 (-inf,inf) 到 [0,1]的非递减满射(onto) 函数集。（典型的CDF满足这个性质。）换句话说，对于任何实数 x，0 <= f(x) <= 1。逻辑函数可能是最著名的例子。

现在，我们以 x 值列表的形式给出了一些约束，并且对于每个 x 值，函数必须位于一对 y 值之间。我们可以将其表示为 {x,ymin,ymax} 三元组的列表，例如

constraints = {{0, 0, 0}, {1, 0.00311936, 0.00416369}, {2, 0.0847077, 0.109064}, 
 {3, 0.272142, 0.354692}, {4, 0.53198, 0.646113}, {5, 0.623413, 0.743102}, 
 {6, 0.744714, 0.905966}}

在图形上看起来像这样：

_{（来源：yootles.com）}

我们现在寻求一条尊重这些约束的曲线。例如：

_{（来源：yootles.com）}

让我们首先尝试通过约束的中点进行简单的插值：

mids = ({#1, Mean[{#2,#3}]}&) @@@ constraints
f = Interpolation[mids, InterpolationOrder->0]

绘制，f 看起来像这样：

_{（来源：yootles.com）}

该函数不是满射的。此外，我们希望它更流畅。我们可以增加插值阶数，但现在它违反了其范围为 [0,1] 的约束：

_{（来源：yootles.com）}

因此，目标是找到满足约束的最平滑函数：

非减少。
当 x 接近负无穷时趋于 0，当 x 接近无穷时趋于 1。
通过给定的 y-error-bars 列表。

我在上面绘制的第一个示例似乎是一个很好的候选者，但我使用 Mathematica 的FindFit函数假设了对数正态 CDF。这在这个特定示例中效果很好，但通常不需要满足约束的对数正态 CDF。

score 5 · Accepted Answer

我认为您没有指定足够的标准来使所需的 CDF 独一无二。

如果必须满足的唯一标准是：

CDF 必须“相当平滑”（见下文）
CDF 必须是非递减的
CDF 必须通过“误差线”y 区间
当 x --> -Infinity 时，CDF 必须趋向于 0
当 x --> 无穷大时，CDF 必须趋向于 1。

那么也许你可以使用Monotone Cubic Interpolation。这将为您提供 C^2（两次连续可微）函数，与三次样条不同，当给定单调数据时，该函数保证是单调的。

这就留下了一个问题，究竟应该使用什么数据来生成单调三次插值。如果你取每个误差线的中心点（平均值），你能保证得到的数据点是单调递增的吗？如果不是，您不妨做出一些任意选择，以保证您选择的点是单调递增的（因为标准不会强制我们的解决方案是唯一的）。

现在如何处理最后一个数据点？是否有一个 X 可以保证大于约束数据集中的任何 x？也许您可以再次任意选择方便并选择一些非常大的 X 并将 (X,1) 作为最终数据点。

评论 1：您的问题可以分为 2 个子问题：

给定 CDF 必须通过的确切点 (x_i,y_i)，如何生成 CDF？我怀疑有无限多可能的解决方案，即使有无限平滑约束。
给定 y-errorbars，你应该如何选择 (x_i,y_i)？同样，有无数种可能的解决方案。可能需要添加一些额外的标准来强制一个独特的选择。额外的标准也可能使问题比现在更难。

评论 2：这是一种使用单调三次插值并满足标准 4 和 5 的方法：

单调三次插值（我们称之为f）映射R --> R。

让CDF(x) = exp(-exp(f(x))). 然后CDF: R --> (0,1)。如果我们能找到合适的f，那么通过CDF这种方式定义，我们就可以满足标准 4 和 5。

要找到，请使用f变换变换 CDF 约束，。这是变换的逆过程。如果's 增加，则's 减少。(x_0,y_0),...,(x_n,y_n)xhat_i = x_iyhat_i = log(-log(y_i))CDFy_iyhat_i

现在将单调三次插值应用于 (x_hat,y_hat) 数据点以生成f. 最后，定义CDF(x) = exp(-exp(f(x))). 这将是一个来自R --> (0,1) 的单调递增函数，它通过点 (x_i,y_i)。

我认为，这满足所有标准 2--5。标准 1 有点满意，尽管肯定存在更平滑的解决方案。

score 4 · Accepted Answer

我找到了一种解决方案，可以为各种输入提供合理的结果。我首先拟合一个模型——一次到约束的低端，然后再到高端。我将这两个拟合函数的平均值称为“理想函数”。我使用这个理想函数来推断约束结束的左侧和右侧，以及在约束中的任何间隙之间进行插值。我定期计算理想函数的值，包括所有约束，从左边的函数接近零到右边的接近一。在约束处，我根据需要裁剪这些值以满足约束。最后，我构造了一个遍历这些值的插值函数。

我的 Mathematica 实现如下。
首先，几个辅助函数：

(* Distance from x to the nearest member of list l. *)
listdist[x_, l_List] := Min[Abs[x - #] & /@ l]

(* Return a value x for the variable var such that expr/.var->x is at least (or
   at most, if dir is -1) t. *)
invertish[expr_, var_, t_, dir_:1] := Module[{x = dir},
  While[dir*(expr /. var -> x) < dir*t, x *= 2];
  x]

这是主要功能：

(* Return a non-decreasing interpolating function that maps from the
   reals to [0,1] and that is as close as possible to expr[var] without
   violating the given constraints (a list of {x,ymin,ymax} triples).
   The model, expr, will have free parameters, params, so first do a
   model fit to choose the parameters to satisfy the constraints as well
   as possible. *)
cfit[constraints_, expr_, params_, var_] := 
Block[{xlist,bots,tops,loparams,hiparams,lofit,hifit,xmin,xmax,gap,aug,bests},
  xlist = First /@ constraints;
  bots = Most /@ constraints; (* bottom points of the constraints *)
  tops = constraints /. {x_, _, ymax_} -> {x, ymax};
  (* fit a model to the lower bounds of the constraints, and 
     to the upper bounds *)
  loparams = FindFit[bots, expr, params, var];
  hiparams = FindFit[tops, expr, params, var];
  lofit[z_] = (expr /. loparams /. var -> z);
  hifit[z_] = (expr /. hiparams /. var -> z);
  (* find x-values where the fitted function is very close to 0 and to 1 *)
  {xmin, xmax} = {
    Min@Append[xlist, invertish[expr /. hiparams, var, 10^-6, -1]],
    Max@Append[xlist, invertish[expr /. loparams, var, 1-10^-6]]};
  (* the smallest gap between x-values in constraints *)
  gap = Min[(#2 - #1 &) @@@ Partition[Sort[xlist], 2, 1]];
  (* augment the constraints to fill in any gaps and extrapolate so there are 
     constraints everywhere from where the function is almost 0 to where it's 
     almost 1 *)
  aug = SortBy[Join[constraints, Select[Table[{x, lofit[x], hifit[x]}, 
                                              {x, xmin,xmax, gap}], 
                                        listdist[#[[1]],xlist]>gap&]], First];
  (* pick a y-value from each constraint that is as close as possible to 
     the mean of lofit and hifit *)
  bests = ({#1, Clip[(lofit[#1] + hifit[#1])/2, {#2, #3}]} &) @@@ aug;
  Interpolation[bests, InterpolationOrder -> 3]]

例如，我们可以拟合对数正态、正态或逻辑函数：

g1 = cfit[constraints, CDF[LogNormalDistribution[mu,sigma], z], {mu,sigma}, z]
g2 = cfit[constraints, CDF[NormalDistribution[mu,sigma], z], {mu,sigma}, z]
g3 = cfit[constraints, 1/(1 + c*Exp[-k*z]), {c,k}, z]

这是我的原始示例约束列表的样子：

_{（来源：yootles.com）}

正态和逻辑几乎在彼此之上，对数正态是蓝色曲线。

这些都不是很完美。特别是，它们不是很单调。这是导数的图：

Plot[{g1'[x], g2'[x], g3'[x]}, {x, 0, 10}]

_{（来源：yootles.com）}

这表明缺乏平滑度以及接近零的轻微非单调性。我欢迎对此解决方案进行改进！

score 0 · Accepted Answer

您可以尝试通过中点拟合贝塞尔曲线。具体来说，我认为您想要一条C2 连续曲线。

math - 曲线拟合：找到满足约束列表的最平滑函数

3 回答 3

Related

Reference