我有一个这样的数据集:
id
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
我只想保留出现 3 次的 id(即keep id=5345 and id=5844
)并删除其余的。我如何在 SAS 中实现这一点?我的数据按id
顺序排序。我想在输出数据集中保留所有三个重复的 ID
使用 PROC SQL,您可以JOIN
创建一个新的数据集,如下所示:
proc sql;
create table want as
select a.*
from have a
join (
select id
from have
group by id
having count(*) = 3
) b
on b.id=a.id
quit;
PROC FREQ 将直接为您提供。
proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;
如果您的意思是 3 或更多,请使用 >= 而不是 =。根据评论,以下是合并回原始数据的示例:
data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;
proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;
proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;
I wasn't sure if you wanted just a list of ID's that appeared 3 times or all rows who have an id that is replicated 3 times. If you want the former, the @bellvueBob's code will get you there.
Otherwise, here is one way to just get a list of the ID's that appear in the data set 3 times. The advantage to this code is small memory usage and speed since the data set is already sorted.
data threeobs(keep=id);
set myid;
by id;
if first.id then cnt=1;
else cnt+1;
if cnt=3 then output;
run;