0

我有两组财务数据,由于单位错误,它们往往包含差异,例如,一个数据集中的 10000 美元可能是另一个数据集中的 1000 美元。

我正在尝试编写检查此类差异的代码,但我能想到的唯一方法是将两个变量分开,看看差异是否在 0.001、0.01、0.1、10、100 等的表中,但它会很难抓住所有的差异。

有没有更聪明的方法来做到这一点?

4

3 回答 3

1

使用proc compare. 确保两个数据集按行或特定组以相同的顺序排序。根据需要使用该by语句。有关选项的更多信息可以在文档中找到。

示例 - 将修改后的cars数据集与sashelp.cars

data cars_modified;
    set sashelp.cars;

    if(mod(_N_, 2) = 0) then msrp = msrp - 100;
run;

proc compare base    = sashelp.cars 
             compare = cars_modified 
             out     = out_differences 
             outnoequal 
             outdif
             noprint;
    var msrp;
run;

仅在 中输出具有差异的观测值out_differences

_TYPE_  _OBS_   MSRP
DIF     2      $-100
DIF     4      $-100
DIF     6      $-100
DIF     8      $-100
DIF     10     $-100
...
于 2020-03-02T18:08:45.203 回答
0

对于数值X,您可以计算最接近的有理表达式p/q

如果你计算比率

X = amount_for_source_A / amount_from_source_B;
status = math.rational(X,1e5,p,q);

该比率将是 10 的倍数,如果p=1 or q=1

例子:

proc ds2;
  package math / overwrite = yes;
    method rational(double x, double maxden, in_out integer p, in_out integer q) returns double;
      /*
      ** FROM: https://www.ics.uci.edu/~eppstein/numth/frap.c
      ** FROM: https://stackoverflow.com/questions/95727/how-to-convert-floats-to-human-readable-fractions
      **
      ** find rational approximation to given real number
      ** David Eppstein / UC Irvine / 8 Aug 1993
      **
      ** With corrections from Arno Formella, May 2008
      **
      ** Modified for Proc DS2, Richard DeVenezia, Jan 2020.
      **
      ** usage: rational(r,d,p,q)
      **   x is real number to approx
      **   maxden is the maximum denominator allowed
      **   p is return for numerator
      **   q is return for denominator
      **   returns 0 if no problems
      **
      ** based on the theory of continued fractions
      ** if x = a1 + 1/(a2 + 1/(a3 + 1/(a4 + ...)))
      ** then best approximation is found by truncating this series
      ** (with some adjustments in the last term).
      **
      ** Note the fraction can be recovered as the first column of the matrix
      **  ( a1 1 ) ( a2 1 ) ( a3 1 ) ...
      **  ( 1  0 ) ( 1  0 ) ( 1  0 )
      ** Instead of keeping the sequence of continued fraction terms,
      ** we just keep the last partial product of these matrices.
      */

      declare integer m[0:1,0:1];
      declare double startx e1 e2;
      declare integer ai t result p1 q1 p2 q2;

      startx = x;

      /* initialize matrix */
      m[0,0] = 1; m[1,1] = 1;
      m[0,1] = 0; m[1,0] = 0;

      /* loop finding terms until denom gets too big */
      do while (1);

        ai = x;

        if not ( m[1,0] * ai + m[1,1] < maxden ) then leave;

        t = m[0,0] * ai + m[0,1];
        m[0,1] = m[0,0];
        m[0,0] = t;

        t = m[1,0] * ai + m[1,1];
        m[1,1] = m[1,0];
        m[1,0] = t;

        if x = ai then leave;     %* AF: division by zero;

        x = 1 / (x - ai);

        if x > 2147483647 /*x'7FFFFFFF'*/ then leave;  %* AF: representation failure;
      end;

      /* now remaining x is between 0 and 1/ai */
      /* approx as either 0 or 1/m where m is max that will fit in maxden */
      /* first try zero */

      p1 = m[0,0];
      q1 = m[1,0];
      e1 = startx - 1.0 * p1 / q1;

      /* now try other possibility */

      ai = (maxden - m[1,1]) / m[1,0];

      m[0,0] = m[0,0] * ai + m[0,1];
      m[1,0] = m[1,0] * ai + m[1,1];

      p2 = m[0,0];
      q2 = m[1,0];
      e2 = startx - 1.0 * p2 / q2;

      if abs(e1) <= abs(e2) then do;
        p = p1;
        q = q1;
      end;
      else do;
        p = p2;
        q = q2;
      end;

      return 0;
    end;
  endpackage;
run;
quit;

* Example uage;

proc ds2;
  data _null_;
    declare package math math();
    declare double x;
    declare int p1 q1 p q;

    method run();
      streaminit(12345);

      x = 0;
      do _n_ = 1 to 20;
        p1 = ceil(rand('uniform',9));
        q1 = ceil(rand('uniform',9));

        x + 1. * p1 / q1;

        math.rational (x, 10000, p, q);

        put 'add' p1 '/' q1 '  ' x=best16. 'is' p '/' q;
      end;
    end;
  enddata;
run;
quit;
----- LOG -----
add 4 / 1    x=               4 is 4 / 1
add 4 / 2    x=               6 is 6 / 1
add 2 / 7    x=6.28571428571429 is 44 / 7
add 4 / 6    x=6.95238095238095 is 146 / 21
add 5 / 2    x=9.45238095238095 is 397 / 42
add 5 / 2    x= 11.952380952381 is 251 / 21
add 7 / 1    x= 18.952380952381 is 398 / 21
add 8 / 6    x=20.2857142857143 is 142 / 7
add 9 / 3    x=23.2857142857143 is 163 / 7
add 8 / 2    x=27.2857142857143 is 191 / 7
add 3 / 1    x=30.2857142857143 is 212 / 7
add 9 / 3    x=33.2857142857143 is 233 / 7
add 4 / 3    x=34.6190476190476 is 727 / 21
add 4 / 6    x=35.2857142857143 is 247 / 7
add 1 / 9    x=35.3968253968254 is 2230 / 63
add 8 / 3    x=38.0634920634921 is 2398 / 63
add 2 / 4    x=38.5634920634921 is 4859 / 126
add 5 / 1    x=43.5634920634921 is 5489 / 126
add 1 / 2    x=44.0634920634921 is 2776 / 63
add 2 / 7    x=44.3492063492064 is 2794 / 63

DS2 数学包

于 2020-03-03T09:29:54.037 回答
0

因此,您似乎要求查找 X/Y 是正好是 1.00Exx 的数字的情况,其中 XX 是一个整数,而不是 0。

data _null_;
do x=1,10,100,1000;
  do y=1,2,3,10.1,10 ;
   ratio = x/y;
   power = floor(log10(ratio));
   if power ne 0 and 1.00 = round(ratio/10**power,0.01) then 
     put 'Ratio of ' x 'over ' y 'is 10**' power '.' 
   ;    
  end;
end;
run;

结果:

Ratio of 1 over 10 is 10**-1 .
Ratio of 10 over 1 is 10**1 .
Ratio of 100 over 1 is 10**2 .
Ratio of 100 over 10 is 10**1 .
Ratio of 1000 over 1 is 10**3 .
Ratio of 1000 over 10 is 10**2 .
于 2020-03-02T19:21:52.497 回答