0

给定以下测试数据:

data test;
input A B;
cards;
1 2
1 1
1 2
run;
NOTE: The data set WORK.TEST has 3 observations and 2 variables.

我知道,如果您不按整个键排序,或者即使您似乎按整个键排序但存在有效的 keep 语句,proc sort 可能会出现意外行为:

proc sort data=test out=test_dedup_works nodup;
    by a _all_;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: Duplicate BY variable(s) specified. Duplicates will be ignored.
NOTE: 1 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_WORKS has 2 observations and 2 variables.

proc sort data=test out=test_dedup_fails nodup;
    by a;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_FAILS has 3 observations and 2 variables.


proc sort data=test (keep=a) out=test_dedup_alsofails nodup;
    by a;
run;
NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.TEST_DEDUP_ALSOFAILS has 3 observations and 1 variables.

对我来说新的是,尝试使用 PROC SQL 对生成的非实际重复数据删除数据集进行重复数据删除无法删除重复项:

proc sql;
    create table test_dedup_eventhisfails as
        select distinct a
        from test_dedup_alsofails;
quit;
NOTE: Table WORK.TEST_DEDUP_EVENTHISFAILS created, with 3 rows and 1 columns.

这是在某处记录的错误还是我做错了什么?

4

0 回答 0