0

我正在尝试将一些现有值设为缺失值(而不是删除它们)。这是我的数据集的基本结构。

当 A 小于 B 时,我想将 AGE 和 GENDER 视为缺失。例如,当 A=1 和 B=3 时,我想将最后两行的 AGE 和 GENDER 的值视为缺失(如数据所示套)。

在我的数据中,A 和 B 都从 1 变为 4,并且具有它们的每种组合。

星号表示它们之间有更多数据。提前致谢!

BEFORE
    ID A B AGE GENDER
    --------------
    1  1 1 35  M
    *  * * *   *
    *  * * *   *
    5  1 2 23  F
    5  1 2 21  M
    6  1 2 42  F
    6  1 2 43  M
    *  * * *   *
    *  * * *   *
    20 1 3 43  F
    20 1 3 39  M
    20 1 3 23  M
    21 1 3 32  F
    21 1 3 39  M
    21 1 3 23  F
    *  * * *   *
    *  * * *   *
    55 2 4 32  M
    55 2 4 12  M
    55 2 4 31  F
    55 2 4 43  M
    *  * * *   *
    *  * * *   *

AFTER    
     ID A B AGE GENDER
     --------------
     1  1 1 35  M
     *  * * *   *
     *  * * *   *
     5  1 2 23  F
     5  1 2 .   .
     6  1 2 42  F
     6  1 2 .   .
     *  * * *   *
     *  * * *   *
     20 1 3 43  F
     20 1 3 .   .
     20 1 3 .   .
     21 1 3 32  F
     21 1 3 .   .
     21 1 3 .   .
     *  * * *   *
     *  * * *   *
     55 2 4 32  M
     55 2 4 12  M
     55 2 4 .   .
     55 2 4 .   . 
     *  * * *   *
     *  * * *   *
4

1 回答 1

5

现在怎么样?

data temp;
  retain idcount 0;
  set olddata;

  ** Create an observation counter for each id **;   
  prev_id = lag(id);

  if id ^= prev_id then idcount = 0;
  idcount = idcount + 1;

run;


** Sort the obs by ID in reverse order **; 
proc sort data=temp; 
    by id descending idcount;
run;

data temp2;
    retain misscount 0;
    set temp;
    by id descending idcount;

    ** Keep the previous age and gender **;
    old_age = age;
    old_gender = gender;

    ** Count the number that should be missing **;
    if a < b then nummiss = b - a;
    else nummiss = 0;

    ** Set a counter of obs that we will set to missing **;   
    if first.id then misscount = 0;

    ** Set the appropriate number of rows to missing and update the counter **;
    if misscount < nummiss then do;
       misscount = misscount + 1;
       call missing(age, gender);
    end;
run;

proc sort data=temp2 out=temp3(drop=misscount nummiss idcount prev_id);
by id idcount;
run;
于 2012-04-12T20:26:06.693 回答