我有两组财务数据,由于单位错误,它们往往包含差异,例如,一个数据集中的 10000 美元可能是另一个数据集中的 1000 美元。
我正在尝试编写检查此类差异的代码,但我能想到的唯一方法是将两个变量分开,看看差异是否在 0.001、0.01、0.1、10、100 等的表中,但它会很难抓住所有的差异。
有没有更聪明的方法来做到这一点?
我有两组财务数据,由于单位错误,它们往往包含差异,例如,一个数据集中的 10000 美元可能是另一个数据集中的 1000 美元。
我正在尝试编写检查此类差异的代码,但我能想到的唯一方法是将两个变量分开,看看差异是否在 0.001、0.01、0.1、10、100 等的表中,但它会很难抓住所有的差异。
有没有更聪明的方法来做到这一点?
使用proc compare
. 确保两个数据集按行或特定组以相同的顺序排序。根据需要使用该by
语句。有关选项的更多信息可以在文档中找到。
示例 - 将修改后的cars
数据集与sashelp.cars
:
data cars_modified;
set sashelp.cars;
if(mod(_N_, 2) = 0) then msrp = msrp - 100;
run;
proc compare base = sashelp.cars
compare = cars_modified
out = out_differences
outnoequal
outdif
noprint;
var msrp;
run;
仅在 中输出具有差异的观测值out_differences
:
_TYPE_ _OBS_ MSRP
DIF 2 $-100
DIF 4 $-100
DIF 6 $-100
DIF 8 $-100
DIF 10 $-100
...
对于数值X,您可以计算最接近的有理表达式p/q。
如果你计算比率
X = amount_for_source_A / amount_from_source_B;
status = math.rational(X,1e5,p,q);
该比率将是 10 的倍数,如果p=1 or q=1
例子:
proc ds2;
package math / overwrite = yes;
method rational(double x, double maxden, in_out integer p, in_out integer q) returns double;
/*
** FROM: https://www.ics.uci.edu/~eppstein/numth/frap.c
** FROM: https://stackoverflow.com/questions/95727/how-to-convert-floats-to-human-readable-fractions
**
** find rational approximation to given real number
** David Eppstein / UC Irvine / 8 Aug 1993
**
** With corrections from Arno Formella, May 2008
**
** Modified for Proc DS2, Richard DeVenezia, Jan 2020.
**
** usage: rational(r,d,p,q)
** x is real number to approx
** maxden is the maximum denominator allowed
** p is return for numerator
** q is return for denominator
** returns 0 if no problems
**
** based on the theory of continued fractions
** if x = a1 + 1/(a2 + 1/(a3 + 1/(a4 + ...)))
** then best approximation is found by truncating this series
** (with some adjustments in the last term).
**
** Note the fraction can be recovered as the first column of the matrix
** ( a1 1 ) ( a2 1 ) ( a3 1 ) ...
** ( 1 0 ) ( 1 0 ) ( 1 0 )
** Instead of keeping the sequence of continued fraction terms,
** we just keep the last partial product of these matrices.
*/
declare integer m[0:1,0:1];
declare double startx e1 e2;
declare integer ai t result p1 q1 p2 q2;
startx = x;
/* initialize matrix */
m[0,0] = 1; m[1,1] = 1;
m[0,1] = 0; m[1,0] = 0;
/* loop finding terms until denom gets too big */
do while (1);
ai = x;
if not ( m[1,0] * ai + m[1,1] < maxden ) then leave;
t = m[0,0] * ai + m[0,1];
m[0,1] = m[0,0];
m[0,0] = t;
t = m[1,0] * ai + m[1,1];
m[1,1] = m[1,0];
m[1,0] = t;
if x = ai then leave; %* AF: division by zero;
x = 1 / (x - ai);
if x > 2147483647 /*x'7FFFFFFF'*/ then leave; %* AF: representation failure;
end;
/* now remaining x is between 0 and 1/ai */
/* approx as either 0 or 1/m where m is max that will fit in maxden */
/* first try zero */
p1 = m[0,0];
q1 = m[1,0];
e1 = startx - 1.0 * p1 / q1;
/* now try other possibility */
ai = (maxden - m[1,1]) / m[1,0];
m[0,0] = m[0,0] * ai + m[0,1];
m[1,0] = m[1,0] * ai + m[1,1];
p2 = m[0,0];
q2 = m[1,0];
e2 = startx - 1.0 * p2 / q2;
if abs(e1) <= abs(e2) then do;
p = p1;
q = q1;
end;
else do;
p = p2;
q = q2;
end;
return 0;
end;
endpackage;
run;
quit;
* Example uage;
proc ds2;
data _null_;
declare package math math();
declare double x;
declare int p1 q1 p q;
method run();
streaminit(12345);
x = 0;
do _n_ = 1 to 20;
p1 = ceil(rand('uniform',9));
q1 = ceil(rand('uniform',9));
x + 1. * p1 / q1;
math.rational (x, 10000, p, q);
put 'add' p1 '/' q1 ' ' x=best16. 'is' p '/' q;
end;
end;
enddata;
run;
quit;
----- LOG -----
add 4 / 1 x= 4 is 4 / 1
add 4 / 2 x= 6 is 6 / 1
add 2 / 7 x=6.28571428571429 is 44 / 7
add 4 / 6 x=6.95238095238095 is 146 / 21
add 5 / 2 x=9.45238095238095 is 397 / 42
add 5 / 2 x= 11.952380952381 is 251 / 21
add 7 / 1 x= 18.952380952381 is 398 / 21
add 8 / 6 x=20.2857142857143 is 142 / 7
add 9 / 3 x=23.2857142857143 is 163 / 7
add 8 / 2 x=27.2857142857143 is 191 / 7
add 3 / 1 x=30.2857142857143 is 212 / 7
add 9 / 3 x=33.2857142857143 is 233 / 7
add 4 / 3 x=34.6190476190476 is 727 / 21
add 4 / 6 x=35.2857142857143 is 247 / 7
add 1 / 9 x=35.3968253968254 is 2230 / 63
add 8 / 3 x=38.0634920634921 is 2398 / 63
add 2 / 4 x=38.5634920634921 is 4859 / 126
add 5 / 1 x=43.5634920634921 is 5489 / 126
add 1 / 2 x=44.0634920634921 is 2776 / 63
add 2 / 7 x=44.3492063492064 is 2794 / 63
因此,您似乎要求查找 X/Y 是正好是 1.00Exx 的数字的情况,其中 XX 是一个整数,而不是 0。
data _null_;
do x=1,10,100,1000;
do y=1,2,3,10.1,10 ;
ratio = x/y;
power = floor(log10(ratio));
if power ne 0 and 1.00 = round(ratio/10**power,0.01) then
put 'Ratio of ' x 'over ' y 'is 10**' power '.'
;
end;
end;
run;
结果:
Ratio of 1 over 10 is 10**-1 .
Ratio of 10 over 1 is 10**1 .
Ratio of 100 over 1 is 10**2 .
Ratio of 100 over 10 is 10**1 .
Ratio of 1000 over 1 is 10**3 .
Ratio of 1000 over 10 is 10**2 .