1

我正在使用 Newton-Raphson 算法使用单精度硬件来划分 IEEE-754 单精度浮点值。

我正在使用这两个链接中描述的方法:

  1. 维基百科牛顿-拉夫森分部
  2. 我正在使用的 Newton-Raphson 方法

然而,尽管计算 Xi 到 X_3(即使用 3 次迭代),我的答案仍然有点偏离。我想知道为什么会这样?我正在使用 MATLAB 比较我的结果。

这是显示不正确结果示例的输出

   =====    DIVISION RESULT    =====
Dividend:        94C5D0C9
Divisor:         D3ACDF9A
My answer:       009277C0
My answer:       +0.00000
Correct answer:  009277C2
Correct answer:  +0.00000


error =

    2.384186e-007

Error:         34800000

k =

      212823

我在这里附上了我的 MATLAB 代码:

% Testing Newton-Raphson Division Algorithm

% Clear window and variables
clc;
clear;

% Open file for reading
fid = fopen('generatedfloats_for_fpdiv.txt','r');

% Put all data into an array
data = fscanf(fid,'%x');

% Loop through data array 2 values at a time
for k=1:2:length(data)

% Formatting floating point numbers into four parts
% Split upper and lower 16 bits of each number into a separate variable
dividend = typecast(uint32(data(k)),'single');
divisor = typecast(uint32(data(k+1)),'single');

% Get upper 16 bits of divided and divisor
a_inword1 = (bitshift(data(k),-16));
b_inword1 = (bitshift(data(k+1),-16));

% Get exponents
a_exponent      = bitshift(a_inword1,-7);                  % Shift exponent right
a_exponent      = bitand(a_exponent,hex2dec('00ff'));      % Remove sign bit from exponent

b_exponent      = bitshift(b_inword1,-7);                  % Shift exponent right
b_exponent      = bitand(b_exponent,hex2dec('00ff'));      % Remove sign bit from exponent

% Create Scaled Divisor and Dividend
divisor_scaled = abs(divisor) * 2^((-1) - (b_exponent-127));
dividend_scaled = abs(dividend) * 2^((a_exponent-b_exponent -1) - (a_exponent-127));

% Solve for X0 = 48/17-32/17*D'
    % Perform multiplication 32/17*D'
    mult1 = single(single(32/17) * single(divisor_scaled));

    % Perform subtraction 48/17 - (32/17*D')
    x0 = single(single(48/17) - single(mult1));
    xi = x0;

% Solve for Xi + Xi(1-D'Xi)

for i=1:3
   % Multiply D'Xi
   mult2 = single(single(divisor_scaled)*single(xi));

   % Subtract 1 - (D'Xi)
   sub2 = single(single(1) - single(mult2));

   % Multiply Xi * (1-D'Xi)
   mult3 = single(single(xi) * single(sub2));

   % Add Xi + (Xi(1-D'Xi))
   xi = single(single(xi) + single(mult3));
end

% Perform final multiplication N'X3
myresult = single(single(dividend_scaled) * single(xi));

% Invert sign if result is supposed to be negative
if (divisor * dividend < 0)
    myresult = -myresult;
end


fprintf('=====    DIVISION RESULT    =====\n');
fprintf('Dividend:        %08X\n', data(k));
fprintf('Divisor:         %08X\n', data(k+1));
fprintf('My answer:       %08X\n',hex2dec(num2hex(single(myresult))));
fprintf('My answer:       %+1.5f\n',single(myresult));
fprintf('Correct answer:  %08X\n',hex2dec(num2hex(single(single(dividend)/single(divisor)))))
fprintf('Correct answer:  %+1.5f\n\n',single(single(dividend)/single(divisor)));

% Calculate Error
if(dividend == 0)
    error = abs(single(0-myresult));
else
    error = abs(single(single(single(dividend)/single(divisor))/single(myresult) - 1));
end

% Pause if there is an error greater than 1 ulp
if (error > 2^-23)
    error
    fprintf('Error:         %08X\n', hex2dec(num2hex(single(error))));
k
pause
end

% Close Input File
fclose(fid);

这是包含测试输入的文件(应该存储在名为 generatefloats_for_fpdiv.txt 的文件中):

7A0C006C
FAA1654F
FF7FFFFF
FF7FFFFF
00000000
FF7FFFFF
7F7FFFFF
FF7FFFFF
00000000
D9E6C924
80800000
80800000
BF800000
80800000
00000000
80800000
3F800000
80800000
00800000
80800000
FF7FFFFF
BF800000
80F329D0
BF800000
80800000
BF800000
BF800000
BF800000
00000000
BF800000
3F800000
BF800000
00800000
BF800000
4D2798AD
BF800000
7F7FFFFF
BF800000
FF7FFFFF
3F800000
C5F4B01A
3F800000
80800000
3F800000
BF800000
3F800000
00000000
3F800000
3F800000
3F800000
00800000
3F800000
02A8DEF8
3F800000
7F7FFFFF
3F800000
A2C4DB20
00800000
80800000
00800000
BF800000
00800000
00000000
00800000
3F800000
00800000
00800000
00800000
00000000
6F301E6E
FF7FFFFF
7F7FFFFF
00000000
7F7FFFFF
7F7FFFFF
7F7FFFFF
CFFFFFFF
10000000
E7DCD51B
3F800000
3DD8AD6C
FCD8AD6C
0CB82AD0
8CB82AD0
8A856DBC
8A856DBC
ADE5A620
ECE5A620
4F7FFFFF
0F800000
C1023D04
C1023CF4
8F6A27BC
8F6A2782
76E0BCED
76D9ED81
B59F5AE2
F49F5A43
12AC77FD
51AC6271
3520744A
740EA042
5D594D83
1D5B038B
AC104C35
2C104C11
9CB2D620
5BB2D61D
AB1AC0B0
6A1AC0AB
1F7C4A96
DE7C4A77
95AD8847
54AD5CF0
1C22AD4A
DB225C1C
7BED0479
BBED047B
C094DEEC
0094DEF6
4DC9E15F
8DC9E1C5
FBC3003F
3BC30103
FEC93614
3EC9686F
436A5FC8
836B4B14
D52DD147
153967C4
A6666AC4
29800000
F98E9ABA
FB000000
36FBF9DF
EF000000
402409D5
7A800000
C81946D2
7C915A00
EC1D18DD
B2E597A5
709EA505
321F01DC
FE51BE2A
4DB9A1BC
6F69A411
60F5814E
08CCC56E
B0A84889
643A641A
6EAE2570
AC0C952C
57000000
D534A524
FE800000
B05CD51B
58000000
8F4D8AD7
95800000
FD2ECBC8
51B61A06
FCCB65E4
CFA4FECB
6EAE153B
BFE119AC
C1A97E44
913647B0
1F730C8E
8EAA321D
C3BC9DEC
A51D0BC8
30A4350D
9058EA7E
9CE2C0D1
9AF8176B
CA3FFFFD
8A400000
F008F561
B008F563
F0754048
B075404B
7D7FFFFD
3D800000
52B209A5
12B209A7
7EFE1868
3EFE186A
C5FFFFFE
86000000
E39A4BC8
A39A4BC9
455EDB0A
055EDB0B
677FFFFF
27800000
C44D976B
044D976C
ED126E29
2D126E2A
DFFFFFFE
20000000
4ED6F4EB
8ED6F4ED
58397C68
98397C6A
C2FFFFFD
03000000
EDED365A
2DED365D
6D3FFFFD
AD400000
E460628E
26000000
E821B21E
29978BE6
5749CDF0
97D2299C
55BF2E4A
95E1CA69
3F50B86B
00D65D71
61DC531D
23000000
FB52DDC0
BC4BE0E8
F6744295
B6B52B80
BDAD17D0
FCAD17D0
03C572DD
42C572DC
90800001
CF800000
8AE965BE
C9E965BC
846E5BBB
C36E5BB8
30800002
6F800000
BC512F61
FB512F5D
BA15F623
F915F620
30800003
6F800000
91800003
50800000
27F87CF2
E6F87CED
1CDDA7B7
DBDDA7B3
88800002
47800000
1AA27658
D9A27656
A1C7E937
60C7E935
16800001
D5800000
35F35105
F4F35105
1F45A1EB
5DE7B505
0E20D62E
CC4E27A5
ADAA8A46
6B800000
B9D2BA3D
F7000000
B87ACA34
F5000000
087E87BE
C4C7E789
FF0C0958
5F1010C0
CF808800
AA2060A1
AAAAAAAA
BFDAFA00
F1199999
7DC06112
5DE66666
EB9295A3
56FFFDB9
AD400241
F0FEEFFF
421859B2
7D800289
F3A82C04
0A11D000
93041140
03800000
BF844014
88100400
AE100000
412AAAAA
7F000010
48199999
63820000
41CE400D
40002000
5AE13FFF
22040000
15000744
1B000002
2E7FFFFF
1FC00000
BA800000
8C080000
DDFD5AB9
AC200080
D8AAAAAA
5B820002
45555555
F9809000
ABE66666
00A00100
857FFF30
10040004
D2302FFF
B5800003
AC0003FF
4A004001
12000000
88000280
702AAAAA
3997DBFD
A9145FFF
1EF56DE9
038000BB
08EE3FFF
B0687000
A25FD21D
43C20004
E5FFFFDF
11810040
447FDFFF
4DFB7FFF
A1FFFFFB
C72AAAAA
8AFFBFFF
54555555
77FFFDFF
274F8658
48FFBFFF
B47FF8A6
6BFFEFFF
474C2000
117FFFFB
45800200
DCDF7FFF
5D808001
6CF5FFFF
40FD5AF7
A1FF3FFF
FC7FEFFF
C27DF7FF
BEAAAAAA
B7FF3FFF
EB555555
59FBEFFF
58199999
D7EFFFFB
31E66666
CFEFFFFE
BE9EAB13
017EFFEF
2B7FFD61
80BFFFFD
6B7FFFFF
5F7DFFFB
D7800000
F7FFBFBF
71832006
72AAAAAA
CA800001
A7AAAAAA
3F904000
AC2AAAAA
84D7DF6F
C1AAAAAA
827FFFFB
36AAAAAA
15AAAAAA
9A2AAAAA
2F199999
63AAAAAA
BBE66666
1E2AAAAA
15ED4FCE
8EAAAAAA
82FFFA1D
B02AAAAA
6F004FFF
4AAAAAAA
487DB000
CD2AAAAA
C2FFFFFF
1DAAAAAA
4E800000
D8AAAAAA
F1900000
B7D55555
C9080200
9CD55555
A86FF7CF
D1D55555
BBFFDFDF
8C555555
ACAAAAAA
3A555555
86199999
BCD55555
52E66666
77555555
19FFFED8
0FD55555
A6B99FFF
50D55555
5300038D
2BD55555
59FFFFFF
BDD55555
FAE0C1AD
E2199999
8F800400
87999999
AE024000
6C999999
A1DF7DFF
A1999999
CFDFFFFF
36999999
736EFFFF
6C199999
442AAAAA
99999999
D0D55555
DA999999
1D999999
1C199999
E9E66666
D6999999
641FF15D
27199999
F0FFFBAC
68999999
7D9DDFFF
69999999
29800061
44199999
571B3000
EC199999
BD800000
D7999999
D3080080
7CE66666
5F7FFEFF
C7666666
3ADFFF7F
3C666666
DB2AAAAA
39666666
68555555
7A666666
D4999999
9BE66666
41E66666
BCE66666
CE02CDE5
77666666
487FF867
C8666666
D4EC7FFF
C9666666
8100051C
A3E66666
BAFFFFFF
06666666
B3800024
0CBBF6B5
91E7F3FF
318D3102
416FFFFF
66B808BC
207EFF7F
DBE3E077
52AAAAAA
B89D573A
FF555555
D9DE7340
6C199999
3B1F0F46
D8E66666
1C60AB4C
655172A1
D6F63010
3F7FFD22
47A7E55D
6C3A9FFF
68E90163
388001D7
237E0628
A54DD000
44BFA22E
127FFFFF
A600BE34
AC000000
D6B2F380
441C889C
627FFA61
65800040
B77FFA1C
35840100
1CFFF82A
15F779FF
01FFFA77
6FFEFFFF
56FFFA32
0377FEFF
ABFFF9EC
03199999
13FFFAC9
50666666
9BFFFE10
FCA0175C
767FFAD5
9000067A
09FFF82D
3C9C8000
E47FFCF2
A97FFFFF
05FFF8F8
23800000
567FFE45
12464A93
32494FFF
71880000
47746FFF
2CFFBFEB
D1AB8FFF
7DFFFEFF
66D69FFF
08FFF6FF
BC023FFF
53AAAAAA
A83B5FFF
9A199999
B3824FFF
93845430
8F596FFF
27800336
A938FFFF
B3EB6000
63CE7FFF
00FFFFFF
A50F5FFF
8D000000
A650AFFF
303024C9
B18007DF
BB810000
57800507
EFEFF7FF
6C0004D8
EB2AAAAA
C7800641
B7D55555
C2000305
31999999
13000052
BE666666
54000458
B7FFFB39
30000522
11BC6FFF
A080026F
BE000009
82000675
4ACF2000
BC800339
97FFFFFF
04800681
24800000
05800287
DF872000
F267D000
C0300000
2C643000
FEEFFFF7
7C20C000
42AAAAAA
6749E000
CF555555
A88B4000
A9199999
9290A000
0EFFFFF5
8FA8D000
BB9F4FFF
F0EA7000
558004C5
219BB000
E21DC000
DC313000
4EFFFFFF
3D725000
FB800000
5EB3E000
EFD80480
C27FFFFF
82080008
3C7FFFFF
AD3FFDDB
51FFFFFF
497FFFFE
A6FFFFFF
A16DFFFF
DBFFFFFF
392AAAAA
A07FFFFF
AD666666
537FFFFF
39DAEA49
54FFFFFF
E67FFCB0
2F7FFFFF
12EDEFFF
507FFFFF
CD000180
A17FFFFF
79D75000
C27FFFFF
52800000
FE7FFFFF
EE800080
B7000000
90100200
CC800000
BE7DFBFD
61800000
DD7FDFFF
C7000000
90AAAAAA
3F800000
3DD55555
60800000
C9999999
1B800000
64666666
4C000000
1DFFF953
35000000
EA3C0FFF
EF800000
D1267000
61800000
CA000000
7D800000
FF0C0958
5F1010C0
CF808800
AA2060A1
AAAAAAAA
BFDAFA00
F1199999
7DC06112
5DE66666
EB9295A3
56FFFDB9
AD400241
F0FEEFFF
421859B2
7D800289
F3A82C04
0A11D000
93041140
03800000
BF844014
88100400
AE100000
412AAAAA
7F000010
48199999
63820000
41CE400D
40002000
5AE13FFF
22040000
15000744
1B000002
2E7FFFFF
1FC00000
BA800000
8C080000
DDFD5AB9
AC200080
D8AAAAAA
5B820002
45555555
F9809000
ABE66666
00A00100
857FFF30
10040004
D2302FFF
B5800003
AC0003FF
4A004001
12000000
88000280
702AAAAA
3997DBFD
A9145FFF
1EF56DE9
038000BB
08EE3FFF
B0687000
A25FD21D
43C20004
E5FFFFDF
11810040
447FDFFF
4DFB7FFF
A1FFFFFB
C72AAAAA
8AFFBFFF
54555555
77FFFDFF
274F8658
48FFBFFF
B47FF8A6
6BFFEFFF
474C2000
117FFFFB
45800200
DCDF7FFF
5D808001
6CF5FFFF
40FD5AF7
A1FF3FFF
FC7FEFFF
C27DF7FF
BEAAAAAA
B7FF3FFF
EB555555
59FBEFFF
58199999
D7EFFFFB
31E66666
CFEFFFFE
BE9EAB13
017EFFEF
2B7FFD61
80BFFFFD
6B7FFFFF
5F7DFFFB
D7800000
F7FFBFBF
71832006
72AAAAAA
CA800001
A7AAAAAA
3F904000
AC2AAAAA
84D7DF6F
C1AAAAAA
827FFFFB
36AAAAAA
15AAAAAA
9A2AAAAA
2F199999
63AAAAAA
BBE66666
1E2AAAAA
15ED4FCE
8EAAAAAA
82FFFA1D
B02AAAAA
6F004FFF
4AAAAAAA
487DB000
CD2AAAAA
C2FFFFFF
1DAAAAAA
4E800000
D8AAAAAA
F1900000
B7D55555
C9080200
9CD55555
A86FF7CF
D1D55555
BBFFDFDF
8C555555
ACAAAAAA
3A555555
86199999
BCD55555
52E66666
77555555
19FFFED8
0FD55555
A6B99FFF
50D55555
5300038D
2BD55555
59FFFFFF
BDD55555
FAE0C1AD
E2199999
8F800400
87999999
AE024000
6C999999
A1DF7DFF
A1999999
CFDFFFFF
36999999
736EFFFF
6C199999
442AAAAA
99999999
D0D55555
DA999999
1D999999
1C199999
E9E66666
D6999999
641FF15D
27199999
F0FFFBAC
68999999
7D9DDFFF
69999999
29800061
44199999
571B3000
EC199999
BD800000
D7999999
D3080080
7CE66666
5F7FFEFF
C7666666
3ADFFF7F
3C666666
DB2AAAAA
39666666
68555555
7A666666
D4999999
9BE66666
41E66666
BCE66666
CE02CDE5
77666666
487FF867
C8666666
D4EC7FFF
C9666666
8100051C
A3E66666
BAFFFFFF
06666666
B3800024
0CBBF6B5
91E7F3FF
318D3102
416FFFFF
66B808BC
207EFF7F
DBE3E077
52AAAAAA
B89D573A
FF555555
D9DE7340
6C199999
3B1F0F46
D8E66666
1C60AB4C
655172A1
D6F63010
3F7FFD22
47A7E55D
6C3A9FFF
68E90163
388001D7
237E0628
A54DD000
44BFA22E
127FFFFF
A600BE34
AC000000
D6B2F380
441C889C
627FFA61
65800040
B77FFA1C
35840100
1CFFF82A
15F779FF
01FFFA77
6FFEFFFF
56FFFA32
0377FEFF
ABFFF9EC
03199999
13FFFAC9
50666666
9BFFFE10
FCA0175C
767FFAD5
9000067A
09FFF82D
3C9C8000
E47FFCF2
A97FFFFF
05FFF8F8
23800000
567FFE45
12464A93
32494FFF
71880000
47746FFF
2CFFBFEB
D1AB8FFF
7DFFFEFF
66D69FFF
08FFF6FF
BC023FFF
53AAAAAA
A83B5FFF
9A199999
B3824FFF
93845430
8F596FFF
27800336
A938FFFF
B3EB6000
63CE7FFF
00FFFFFF
A50F5FFF
8D000000
A650AFFF
303024C9
B18007DF
BB810000
57800507
EFEFF7FF
6C0004D8
EB2AAAAA
C7800641
B7D55555
C2000305
31999999
13000052
BE666666
54000458
B7FFFB39
30000522
11BC6FFF
A080026F
BE000009
82000675
4ACF2000
BC800339
97FFFFFF
04800681
24800000
05800287
DF872000
F267D000
C0300000
2C643000
FEEFFFF7
7C20C000
42AAAAAA
6749E000
CF555555
A88B4000
A9199999
9290A000
0EFFFFF5
8FA8D000
BB9F4FFF
F0EA7000
558004C5
219BB000
E21DC000
DC313000
4EFFFFFF
3D725000
FB800000
5EB3E000
EFD80480
C27FFFFF
82080008
3C7FFFFF
AD3FFDDB
51FFFFFF
497FFFFE
A6FFFFFF
A16DFFFF
DBFFFFFF
392AAAAA
A07FFFFF
AD666666
537FFFFF
39DAEA49
54FFFFFF
E67FFCB0
2F7FFFFF
12EDEFFF
507FFFFF
CD000180
A17FFFFF
79D75000
C27FFFFF
52800000
FE7FFFFF
EE800080
B7000000
90100200
CC800000
BE7DFBFD
61800000
DD7FDFFF
C7000000
90AAAAAA
3F800000
3DD55555
60800000
C9999999
1B800000
64666666
4C000000
1DFFF953
35000000
EA3C0FFF
EF800000
D1267000
61800000
CA000000
7D800000
1A6AA41C
B42C29BF
6E11A944
D480452D
105640A2
87959BD1
7422DC0D
6D0E43AA
55ACBB94
1D06D383
C80AA52A
5CE3851F
099B7C06
0397C72F
89D5073F
A25265A2
4476B3FE
1704BD92
4581E9FD
3461D127
3B815934
1E1EA9BE
D74C0685
CCE34EAC
3E52F9BF
30420D4A
B9EA85EB
8C3D45DE
A162F6DA
9459E167
9C29EA20
C8F796B1
C5BD9F02
B7168D05
F736D700
3946C8CE
E20B8F25
BC967BD2
61BAC3AA
5205DABA
257338C7
AC30E694
13C5DD3F
2307A194
0B9317AE
AC534B2B
99271710
47103D91
E5F8F6BF
C7800000
F145F346
63400000
C91F2994
97C00000
C1B8E7D5
18D40000
F5844BAD
6AFA0000
ACF90D5F
5D794000
DE3632EC
49804000
DDBE13CC
6F5B4000
204C6402
4E962400
EC2773F0
34935000
75FBB2EC
DCAE2600
76F79C0E
BADC18F0
00000000
A01F7AF0
1FACEE1B
30D0EFBD
4

3 回答 3

2

您展示的算法试图评估除数 D 的倒数,然后乘以被除数 N,或者换句话说single(N * single(1/D))

对于一组错误的操作数,这总是会失败。例如,取 N=3 和 D=7 并查看这些结果:

0.14285714924335479736328125  = float(1/7)         = closest 32bits float to (1/7)
0.42857144773006439208984375  = 3*float(1/7)       = 3 times closest float to (1/7)
0.4285714626312255859375      = float(3*float(1/7) = closest float to above number
0.4285714328289031982421875   = float(3/7)         = closest float to (3/7)

凭借 IEEE 754 属性,float(float(3)/float(7)) 是最接近 3/7 的浮点数。
因此,如果您的内部循环完美地找到最接近 7 的倒数的浮点数,您不能简单地通过单精度乘以 3 来重建最接近 3/7 的浮点数......

我们在这里高估了:

float(float(3)*float(1/7)) > float(3/7) > (3/7)

但是如果我们在 float(1/7) 之前取下一个可表示的浮点数,我们就会低估:

float(3/7) > (3/7) > float(float(3)*predecessor(float(1/7)))

所以,还有更多:我们刚刚证明没有任何浮点 Q 使得 float(float(3)*Q) = float(float(3)/float(7))...

即使,如果您的内部互惠循环不完美,外部循环single(N*reciprocal(D))仍然会失败......

编辑:然后如何进行?

一旦你得到一个合理的倒数近似值,正确计算除法的一种可能方法是计算重建误差并补偿它,就像用这个伪代码演示的那样,但你需要模拟融合乘加:

z := float(1/y)  ;  // Estimate of the reciprocal - i.e. with your inner loop
q := float(x * z);  // Estimate of the quotient
r := float(x - y*q);// Estimate of the error using a FMA
return float(q + r*z); // correct the error using another FMA

这就是我在 Smalltalk 实现任意精度浮点 ( https://code.google.com/p/arbitrary-precision-float/ ) 中的做法。
您可以在 Internet 上找到其他算法。我喜欢除法的一个参考是:

提前知道除数时加速正确舍入的浮点除法 - Nicolas Brisebarre、Jean-Michel Muller,IEEE 成员和 Saurabh Kumar Raina - http://perso.ens-lyon.fr/jean-michel.muller/ DivIEEETC-aug04.pdf

于 2014-07-20T13:40:20.747 回答
1

我认为您的答案在单精度浮点数的计算精度范围内。浮点数的最大精度仅为 7 位(有关信息,请查看 wikipedia 文章或 IEEE-754 标准)

单精度

http://en.wikipedia.org/wiki/Single_precision

您可以通过除以答案(而不是减去它们)来检查准确性,准确性不是以数量为单位,而是以指数符号表示的数字。

-340282346638528860000000000000000000000.00000 / -340282306073709650000000000000000000000.00000 = 1.000000119209311

因此,在单浮点数的机器精度范围内,误差约为 1e-7。

或者查看匹配的十进制位数,即 7,这是单精度浮点数的限制:

3402823 46638528860000000000000000000000.00000

3402823 06073709650000000000000000000000.00000

双精度

如果您使用双精度(64 位双精度),那么精度会增加到大约 16 位有效数字。

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

累计机器精度 (AMA)

累积的机器精度是我们用来描述舍入误差叠加的影响的东西。假设每个单精度数都有准确的答案加上一些小误差 eps(epsilon,单次操作的机器精度)。现在将 A/B + C/D 相加。首先,数字 A、B、C 和 D 通常不能用浮点运算精确表示,并且可能包含与 eps 一样大的误差。其次,操作 / 可能会引入 eps 的错误。现在让 A/B 的答案是 E 加上一些小误差 eps,而 F 是 C/D 的答案加上一些小误差 eps。这意味着 A/B + C/D = E+eps + F+eps = E+F+2eps。

当然,有时“介于两者之间”的答案(例如 E 和 F)是高估和低估的,因此会稍微消除错误。我总是使用经验法则,即在迭代方案中(尤其是在 FEM、FVM 或 VOF 中)可实现的最大准确度(可以肯定,不能保证)是总和乘以 eps 中的元素数。

换句话说:您的代码看起来不错;)

于 2014-07-17T07:23:34.230 回答
-1

维基百科不建议使用一对融合乘加吗?

您可以尝试使用双精度模拟 fused-multiply-add

function y=fmadd(a,x,b)
y = single(a*x+b)

并将其用于:

divisor_scaled = single(divisor_scaled); % put this out of the loop
% ...snip...

xi = fmadd( fmadd(-divisor_scaled,xi,1) , xi , xi);

编辑

正如我在另一个答案中所解释的那样,牛顿拉夫森迭代可以得到倒数的近似值1/D,但这不足以N/D通过简单地将这个倒数近似值乘以 N 来产生精确的舍入除法,还需要进一步的步骤。

所以我提出的 FMA 肯定不会像现在这样使整个部门工作。

但是值得测试一下,它确实有助于获得倒数的精确舍入近似值,因为这是一些除法算法所要求的,例如我在其他答案中提出的算法。

如果是这样,那么有一些方法可以仅使用单精度 + - 和 * 来模拟 FMA

于 2014-07-18T21:13:28.847 回答