1

我有一个包含数据的区域表。对于特定操作,我想排除顶部和底部 1% 的区域,因为它们包含极端异常值。

在我看来,前进的道路是:

SORT CASES BY theVariableIwantToAnalyse  (A) .
NUMERIC id (F12.0) .  * create a casenum label "id"
COMPUTE id = $CASENUM. * populate it with casenum
EXECUTE.
NUMERIC idmax (F12.4) .   * create a variable to contain the highest value for "id" 
NUMERIC id1perc (F12.4) . * create a variable to contain 1% of the highest value for "id"  
COMPUTE idmax = MAX(id) .    * determine the highest value for id. This 'mock-syntax' line does not work.   
COMPUTE id1perc = idmax / 100 . * 1% of the highest value for "id"  
SELECT CASES WHERE ID >= id1perc or ID <= idmax - id1perc .

绘制图表等。然后我需要

SORT CASES BY theNextVariableIwantToAnalyse  (A) .
COMPUTE id = $CASENUM. * populate it with the NEW casenum order
EXECUTE.

ETC ...

4

2 回答 2

2

试试这个来简单地过滤掉顶部和底部的 1% - 只需添加FILTER BY filter.以关闭所有极端情况,或者SELECT IF filter....EXECUTE.删除它们

RANK编辑:请注意,方法(特别是/TIES选项)将压缩重复值。如果您有可能重复值,这可能并不理想。/TIES如果是这种情况,请更改选项。

************* GENERATE RANDOM DATA *****************.
INPUT PROGRAM.
-       LOOP #I = 1 TO 1000.
-             COMPUTE Y = RV.NORMAL(100,10).
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

dataset name exampleData WINDOW=front.
EXECUTE.


************* RANK DATA  *************.
DATASET ACTIVATE exampleData.
RANK VARIABLES=Y (A)
  /RFRACTION INTO fractile
  /TIES=CONDENSE.

************* MAKE A FILTER  *************.
COMPUTE filter = (fractile>0.01 AND fractile < 0.99).
EXECUTE.

* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y filter MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y=col(source(s), name("Y"))
  DATA: filter=col(source(s), name("filter"), unit.category())
  GUIDE: axis(dim(1), label("Y"))
  GUIDE: axis(dim(2), label("Frequency"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("filter"))
  ELEMENT: interval.stack(position(summary.count(bin.rect(Y))), color.interior(filter), 
    shape.interior(shape.square))
END GPL.
于 2013-07-18T21:39:59.347 回答
1

一个更简单的解决方案是使用 RANK,然后选择要排除的等级。

于 2013-07-18T17:44:52.420 回答