10

我想从数据集中删除所有空白观察。我只知道如何从一个变量中去掉空格:

data a;
set data(where=(var1 ne .)) ;
run;

在这里,我设置了一个没有 var1 空白的新数据集。但是,当我想摆脱整个数据集中的所有空白时,该怎么做呢?

提前感谢您的回答。

4

4 回答 4

16

如果您试图删除缺少所有变量的行,这很容易:

/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
  call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
  call missing(of _character_);
end;
run;

/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;

除了压缩,您也可以OPTIONS MISSING=' ';事先使用。

如果要删除所有缺失值的所有行,则可以使用 NMISS/CMISS 函数。

data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;

或者

data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;

对于所有字符+数字变量。

于 2013-06-25T14:21:16.457 回答
6

你可以这样做:

data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;

这将扫描所有数值变量,并将删除发现缺失值的观察

于 2013-06-25T14:20:06.727 回答
1

干得好。无论变量是字符还是数字,这都将起作用。

data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
 . . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;

%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs: 
        inDsn: Input dataset with some or all columns missing for some or all rows
        outDsn: Output dataset with some or all columns NOT missing for some or all rows
        Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have  missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then            call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else        call symputx(compress("var"!!varnum), compbl(name!!" NE ''  "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;

DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
    &&var&i.
        %if &i < &no_of_obs. %then &Exclusion; 
%end;
;
run;

%mend removeRowsWithMissingVals;

%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);

Outout of with OutBlanksAND:

a   x   y   z
a   1   2   3
f   1   2   3

withOutBlanksOR 的输出:

a   x   y   z
a   1   2   3
b   1   .   3
c   .   .   3
e   1   .   3
f   1   2   3
于 2013-06-25T13:17:51.807 回答
0

真的很奇怪,没有人提供这个优雅的答案:

if missing(cats(of _all_)) then delete;

编辑:确实,我没有意识到cats(of _all_)返回一个点“。” 对于缺失的数值。

作为修复,我建议这样做,这似乎更可靠:

*-- Building a sample dataset with test cases --*;
data test;
    attrib a format=8.;
    attrib b format=$8.;
    
    a=.;    b='a';  output;
    a=1;    b='';   output;
    a=.;    b='';   output; * should be deleted;
    a=.a;   b='';   output; * should be deleted;
    a=.a;   b='.';  output;
    a=1;    b='b';  output;
run;

*-- Apply the logic to delete blank records --*;
data test2;
    set test;
    
    *-- Build arrays of numeric and characters --*;
    *-- Note: array can only contains variables of the same type, thus we must create 2 different arrays --*;
    array nvars(*) _numeric_;
    array cvars(*) _character_;

    *-- Delete blank records --*;
    *-- Blank record: # of missing num variables + # of missing char variables = # of numeric variables + # of char variables --*;
    if nmiss(of _numeric_) + cmiss(of _character_) = dim(nvars) + dim(cvars) then delete;
run;

主要问题是如果根本没有数字(或根本没有字符),创建一个空数组将生成一个警告,并且调用 nmiss/cmiss 一个错误。

所以,我认为到目前为止,除了在数据步骤之外构建一个 SAS 语句来识别空记录之外,别无选择:

*-- Building a sample dataset with test cases --*;
data test;
    attrib a format=8.;
    attrib b format=$8.;

    a=.;    b='a';  output;
    a=1;    b='';   output;
    a=.;    b='';   output; * should be deleted;
    a=.a;   b='';   output; * should be deleted;
    a=.a;   b='.';  output;
    a=1;    b='b';  output;
run;

*-- Create a SAS statement which test any missing variable, regardless of its type --*;
proc sql noprint;
    select      distinct 'missing(' || strip(name) || ')'
    into        :miss_stmt separated by ' and '
    from        dictionary.columns
    where       libname = 'WORK'
        and     memname = 'TEST'
    ;
quit;

/*
miss_stmt looks like missing(a) and missing(b)
*/

*-- Delete blank records --*;
data test2;
    set test;
    
    if &miss_stmt. then delete;
run;
于 2021-10-26T12:43:43.677 回答