dataset - 如何删除列中的一些观察？

Question

我有一个这样的数据集：

我只想保留出现 3 次的 id（即keep id=5345 and id=5844）并删除其余的。我如何在 SAS 中实现这一点？我的数据按id顺序排序。我想在输出数据集中保留所有三个重复的 ID

score 3 · Accepted Answer

使用 PROC SQL，您可以JOIN创建一个新的数据集，如下所示：

proc sql;
   create table want as
   select a.*
   from have a
   join (
      select id
      from   have
      group by id
      having count(*) = 3
      ) b
   on b.id=a.id
quit;

score 2 · Accepted Answer

PROC FREQ 将直接为您提供。

proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;

如果您的意思是 3 或更多，请使用 >= 而不是 =。根据评论，以下是合并回原始数据的示例：

data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;

proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;

proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;

score 2 · Accepted Answer

I wasn't sure if you wanted just a list of ID's that appeared 3 times or all rows who have an id that is replicated 3 times. If you want the former, the @bellvueBob's code will get you there.

Otherwise, here is one way to just get a list of the ID's that appear in the data set 3 times. The advantage to this code is small memory usage and speed since the data set is already sorted.

data threeobs(keep=id);
  set myid;
  by id;
  if first.id then cnt=1;
  else cnt+1;
  if cnt=3 then output;
run;

dataset - 如何删除列中的一些观察？

3 回答 3

Related

Reference