c++ - 平方差的数值精度

Question

在我的代码中，我经常计算如下部分（为简单起见，这里是 C 代码）：

float cos_theta = /* some simple operations; no cosf call! */;
float sin_theta = sqrtf(1.0f - cos_theta * cos_theta); // Option 1

对于此示例，请忽略平方根的参数可能由于不精确而为负。我通过额外的fdimf电话解决了这个问题。但是，我想知道以下是否更准确：

float sin_theta = sqrtf((1.0f + cos_theta) * (1.0f - cos_theta)); // Option 2

cos_theta介于两者之间-1，+1因此对于每个选择，都会有我减去相似数字的情况，因此会降低精度，对吧？什么是最精确的，为什么？

score 3 · Accepted Answer

使用浮点数最精确的方法可能是使用单个 x87 指令fsincos来计算 sin 和 cos 。

~~但是，如果您需要手动进行计算，最好将具有相似大小的参数分组。这意味着第二个选项更精确，尤其是在cos_theta接近 0 时，精度最重要。~~

正如文章 What Every Computer Scientist Should Know About Floating-Point Arithmetic指出：

表达式 x ² - y ²是另一个展示灾难性抵消的公式。将其评估为 (x - y)(x + y)更准确。

编辑：它比这更复杂。尽管上述内容通常是正确的，但当 x 和 y 的大小非常不同时，(x - y)(x + y) 的准确度会稍低一些，正如该声明的脚注所解释的那样：

在这种情况下，(x - y)(x + y) 有三个舍入误差，但 x ² - y ²只有两个，因为在计算 x ²和 y ²中较小的值时提交的舍入误差不会影响最终的减法。

换句话说，取 x - y、x + y 和乘积 (x - y)(x + y) 都会引入舍入误差（3 步舍入误差）。x ² , y ²和减法 x ² - y ²也都引入了舍入误差，但是通过对相对较小的数（x 和 y 中较小的一个）进行平方得到的舍入误差可以忽略不计，实际上只有两个步骤舍入误差，使平方差更精确。

所以选项1实际上会更精确。dev.brutus 的 Java 测试证实了这一点。

score 3 · Accepted Answer

我写了小测试。它以双精度计算期望值。然后它会根据您的选择计算错误。第一个选项更好：

Algorithm: FloatTest$1
option 1 error = 3.802792362162126
option 2 error = 4.333273185303996
Algorithm: FloatTest$2
option 1 error = 3.802792362167937
option 2 error = 4.333273185305868

Java 代码：

import org.junit.Test;

public class FloatTest {

    @Test
    public void test() {
        testImpl(new ExpectedAlgorithm() {
            public double te(double cos_theta) {
                return Math.sqrt(1.0f - cos_theta * cos_theta);
            }
        });
        testImpl(new ExpectedAlgorithm() {
            public double te(double cos_theta) {
                return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
            }
        });
    }

    public void testImpl(ExpectedAlgorithm ea) {
        double delta1 = 0;
        double delta2 = 0;
        for (double cos_theta = -1; cos_theta <= 1; cos_theta += 1e-8) {
            double[] delta = delta(cos_theta, ea);
            delta1 += delta[0];
            delta2 += delta[1];
        }

        System.out.println("Algorithm: " + ea.getClass().getName());
        System.out.println("option 1 error = " + delta1);
        System.out.println("option 2 error = " + delta2);
    }

    private double[] delta(double cos_theta, ExpectedAlgorithm ea) {
        double expected = ea.te(cos_theta);
        double delta1 = Math.abs(expected - t1((float) cos_theta));
        double delta2 = Math.abs(expected - t2((float) cos_theta));

        return new double[]{delta1, delta2};
    }

    private double t1(float cos_theta) {
        return Math.sqrt(1.0f - cos_theta * cos_theta);
    }

    private double t2(float cos_theta) {
        return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
    }

    interface ExpectedAlgorithm {
        double te(double cos_theta);
    }

}

score 2 · Accepted Answer

推理某些表达式的数值精度的正确方法是：

测量相对于ULP中正确值的结果差异（最后一个单位），由 WH Kahan 于 1960 年引入。您可以在此处找到 C、Python 和 Mathematica 实现，并在此处了解有关该主题的更多信息。
根据它们产生的最坏情况来区分两个或多个表达式，而不是像其他答案或其他任意度量那样的平均绝对误差。这就是如何构造数值逼近多项式（Remez 算法），如何分析标准库方法的实现（例如 Intel atan2）等...

考虑到这一点，version_1: sqrt(1 - x * x) 和 version_2: sqrt((1 - x) * (1 + x)) 产生明显不同的结果。如下图所示，对于 x 接近 1 且错误 > 1_000_000 ulps，version_1 表现出灾难性的性能，而另一方面，version_2 的错误表现良好。

这就是为什么我总是推荐使用version_2，即利用平方差公式。

产生 square_diff_error.csv 文件的 Python 3.6 代码：

from fractions import Fraction
from math import exp, fabs, sqrt
from random import random
from struct import pack, unpack


def ulp(x):
    """
    Computing ULP of input double precision number x exploiting
    lexicographic ordering property of positive IEEE-754 numbers.

    The implementation correctly handles the special cases:
      - ulp(NaN) = NaN
      - ulp(-Inf) = Inf
      - ulp(Inf) = Inf

    Author: Hrvoje Abraham
    Date: 11.12.2015
    Revisions: 15.08.2017
               26.11.2017
    MIT License https://opensource.org/licenses/MIT

    :param x: (float) float ULP will be calculated for
    :returns: (float) the input float number ULP value
    """

    # setting sign bit to 0, e.g. -0.0 becomes 0.0
    t = abs(x)

    # converting IEEE-754 64-bit format bit content to unsigned integer
    ll = unpack('Q', pack('d', t))[0]

    # computing first smaller integer, bigger in a case of ll=0 (t=0.0)
    near_ll = abs(ll - 1)

    # converting back to float, its value will be float nearest to t
    near_t = unpack('d', pack('Q', near_ll))[0]

    # abs takes care of case t=0.0
    return abs(t - near_t)


with open('e:/square_diff_error.csv', 'w') as f:
    for _ in range(100_000):
        # nonlinear distribution of x in [0, 1] to produce more cases close to 1
        k = 10
        x = (exp(k) - exp(k * random())) / (exp(k) - 1)

        fx = Fraction(x)
        correct = sqrt(float(Fraction(1) - fx * fx))

        version1 = sqrt(1.0 - x * x)
        version2 = sqrt((1.0 - x) * (1.0 + x))

        err1 = fabs(version1 - correct) / ulp(correct)
        err2 = fabs(version2 - correct) / ulp(correct)

        f.write(f'{x},{err1},{err2}\n')

产生最终图的 Mathematica 代码：

data = Import["e:/square_diff_error.csv"];

err1 = {1 - #[[1]], #[[2]]} & /@ data;
err2 = {1 - #[[1]], #[[3]]} & /@ data;

ListLogLogPlot[{err1, err2}, PlotRange -> All, Axes -> False, Frame -> True,
    FrameLabel -> {"1-x", "error [ULPs]"}, LabelStyle -> {FontSize -> 20}]

score 1 · Accepted Answer

顺便说一句，当 theta 很小时，你总是会遇到问题，因为余弦在 theta = 0 附近是平坦的。如果 theta 在 -0.0001 和 0.0001 之间，那么 float 中的 cos(theta) 正好是一，所以你的 sin_theta 将是零。

要回答您的问题，当 cos_theta 接近 1（对应于小 theta）时，您的第二次计算显然更准确。以下程序显示了这一点，该程序列出了不同 cos_theta 值的两种计算的绝对误差和相对误差。通过与使用 GNU MP 库以 200 位精度计算的值进行比较，然后将其转换为浮点数来计算错误。

#include <math.h>
#include <stdio.h>
#include <gmp.h>

int main() 
{
  int i;
  printf("cos_theta       abs (1)    rel (1)       abs (2)    rel (2)\n\n");
  for (i = -14; i < 0; ++i) {
    float x = 1 - pow(10, i/2.0);
    float approx1 = sqrt(1 - x * x);
    float approx2 = sqrt((1 - x) * (1 + x));

    /* Use GNU MultiPrecision Library to get 'exact' answer */
    mpf_t tmp1, tmp2;
    mpf_init2(tmp1, 200);  /* use 200 bits precision */
    mpf_init2(tmp2, 200);
    mpf_set_d(tmp1, x);
    mpf_mul(tmp2, tmp1, tmp1);  /* tmp2 = x * x */
    mpf_neg(tmp1, tmp2);        /* tmp1 = -x * x */
    mpf_add_ui(tmp2, tmp1, 1);  /* tmp2 = 1 - x * x */
    mpf_sqrt(tmp1, tmp2);       /* tmp1 = sqrt(1 - x * x) */
    float exact = mpf_get_d(tmp1);

    printf("%.8f     %.3e  %.3e     %.3e  %.3e\n", x,
           fabs(approx1 - exact), fabs((approx1 - exact) / exact),
           fabs(approx2 - exact), fabs((approx2 - exact) / exact));
    /* printf("%.10f  %.8f  %.8f  %.8f\n", x, exact, approx1, approx2); */
  }
  return 0;
}

输出：

cos_theta       abs (1)    rel (1)       abs (2)    rel (2)

0.99999988     2.910e-11  5.960e-08     0.000e+00  0.000e+00
0.99999970     5.821e-11  7.539e-08     0.000e+00  0.000e+00
0.99999899     3.492e-10  2.453e-07     1.164e-10  8.178e-08
0.99999684     2.095e-09  8.337e-07     0.000e+00  0.000e+00
0.99998999     1.118e-08  2.497e-06     0.000e+00  0.000e+00
0.99996835     6.240e-08  7.843e-06     9.313e-10  1.171e-07
0.99989998     3.530e-07  2.496e-05     0.000e+00  0.000e+00
0.99968380     3.818e-07  1.519e-05     0.000e+00  0.000e+00
0.99900001     1.490e-07  3.333e-06     0.000e+00  0.000e+00
0.99683774     8.941e-08  1.125e-06     7.451e-09  9.376e-08
0.99000001     5.960e-08  4.225e-07     0.000e+00  0.000e+00
0.96837723     1.490e-08  5.973e-08     0.000e+00  0.000e+00
0.89999998     2.980e-08  6.837e-08     0.000e+00  0.000e+00
0.68377221     5.960e-08  8.168e-08     5.960e-08  8.168e-08

当 cos_theta 不接近 1 时，两种方法的准确度非常接近，并且接近舍入误差。

score 0 · Accepted Answer

[为主要 think-o 编辑] 在我看来，选项 2 会更好，因为对于像0.000001选项 1 这样的数字，选项 1 将返回正弦为 1，而选项将返回一个小于 1 的数字。

score 0 · Accepted Answer

我的选择没有区别，因为 (1-x) 保留了不影响进位的精度。那么对于 (1+x) 也是如此。那么唯一影响进位精度的是乘法。所以在这两种情况下都有一个单一的乘法，所以它们都可能给出相同的进位错误。

c++ - 平方差的数值精度

6 回答 6

Related

Reference