select - 如何在 SAS 中实现此选择

Question

假设我有一个 SAS 表tbl，其中有一列col。此列col包含不同的值{"a","s","d","f",...}，但其中一个值比另一个值多得多（例如"d"）。我怎样才能只选择这个值

会是这样的

data tbl;
  set tbl;
  where col eq "the most present element of col in this case d";
run;

score 3 · Accepted Answer

实现此目的的众多方法之一...

data test;
n+1;
input col $;
datalines;
a
b
c
d
d
d
d
e
f
g
d
d
a
b
d
d
;
run;

proc freq data=test order=freq;  *order=freq automatically puts the most frequent on top;
tables col/out=test_count;
run;

data want;
set test;
if _n_ = 1 then set test_count(keep=col rename=col=col_keep);
if col = col_keep;
run;

要将其放入宏变量中（请参阅注释）：

data _null_;
set test_count;
call symput("mvar",col); *put it to a macro variable;
stop;                    *only want the first row;
run;

score 1 · Accepted Answer

我会为此使用 PROC SQL。

这是一个将“d”放入宏变量的示例，然后按照您的问题中的要求过滤原始数据集。

即使对于最频繁的观察存在多向连接，这也将起作用。

data tbl;
    input col: $1.;
    datalines;
    a
    a
    b
    b
    b
    b
    c
    c
    c
    c
    d
    d
    d
;run;

proc sql noprint;
    create table tbl_freq as
    select col, count(*) as freq
    from tbl
    group by col;    

    select quote(col) into: mode_values separated by ', '
    from tbl_freq
    where freq = (select max(freq) from tbl_freq);
quit;

%put mode_values = &mode_values.;

data tbl_filtered;
    set tbl;
    where col in (&mode_values.);
run;

注意 QUOTE() 的使用，它需要将 col 的值括在引号中（如果 col 是数字变量，则省略它）。

select - 如何在 SAS 中实现此选择

2 回答 2

Related

Reference