0

我有一个这样的数据集:

id
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844

我只想保留出现 3 次的 id(即keep id=5345 and id=5844)并删除其余的。我如何在 SAS 中实现这一点?我的数据按id顺序排序。我想在输出数据集中保留所有三个重复的 ID

4

3 回答 3

3

使用 PROC SQL,您可以JOIN创建一个新的数据集,如下所示:

proc sql;
   create table want as
   select a.*
   from have a
   join (
      select id
      from   have
      group by id
      having count(*) = 3
      ) b
   on b.id=a.id
quit;
于 2013-07-15T18:47:25.407 回答
2

PROC FREQ 将直接为您提供。

proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;

如果您的意思是 3 或更多,请使用 >= 而不是 =。根据评论,以下是合并回原始数据的示例:

data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;

proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;

proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;
于 2013-07-15T20:42:08.970 回答
2

I wasn't sure if you wanted just a list of ID's that appeared 3 times or all rows who have an id that is replicated 3 times. If you want the former, the @bellvueBob's code will get you there.

Otherwise, here is one way to just get a list of the ID's that appear in the data set 3 times. The advantage to this code is small memory usage and speed since the data set is already sorted.

data threeobs(keep=id);
  set myid;
  by id;
  if first.id then cnt=1;
  else cnt+1;
  if cnt=3 then output;
run;
于 2013-07-15T18:57:13.673 回答