4

我一直在将所有数据集移动到 SPDE 库中,因为我在所有方面都体验到了出色的性能提升。一切,直到运行 proc transpose。与存储在普通 v9 库中的相同数据集相比,在 SPDE 数据集上执行所需的时间约为 60 倍。数据集按 item_id 排序。它正在被读/写到同一个库。

有谁知道为什么会这样?我是否错过了一些关于 SPDE 和 Proc Transpose 不能很好地协同工作的重要信息?

SPDE 库

MPRINT(XMLIMPORT_VANTAGE):   proc transpose data = smplus.links_response_mechanism out = smplus.response_mechanism (drop = _NAME_) 
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE):   by item_id;
MPRINT(XMLIMPORT_VANTAGE):   id lookup_code;
MPRINT(XMLIMPORT_VANTAGE):   var x;
MPRINT(XMLIMPORT_VANTAGE):   run;

NOTE: There were 5866747 observations read from the data set SMPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set SMPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set SMPLUS.RESPONSE_MECHANISM decreased size by 37.98 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           28:27.63
      cpu time            28:34.64

V9 库

MPRINT(XMLIMPORT_VANTAGE):   proc transpose data = mplus.links_response_mechanism out = mplus.response_mechanism (drop = _NAME_) 
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE):   by item_id;
68                                                         The SAS System                             02:00 Thursday, August 8, 2013

MPRINT(XMLIMPORT_VANTAGE):   id lookup_code;
MPRINT(XMLIMPORT_VANTAGE):   var x;
MPRINT(XMLIMPORT_VANTAGE):   run;

NOTE: There were 5866747 observations read from the data set MPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set MPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set MPLUS.RESPONSE_MECHANISM decreased size by 27.60 percent. 
      Compressed is 32271 pages; un-compressed would require 44572 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           28.76 seconds
      cpu time            28.79 seconds
4

3 回答 3

3

在我看来,PROC TRANSPOSE 和 SPDE 存在一些问题。这是一个简单的SSCCE,它有很大的不同;没有你的那么重要,但在某种程度上,这可能是桌面上的一个因素,首先没有特别实质性的性能调整。听起来像是打电话给 SAS 技术支持。

libname spdelib spde 'c:\temp\SPDE Main' 
    datapath=('c:\temp\SPDE Data' 'd:\temp\SPDE Data')
    indexpath=('d:\temp\SPDE Index')
    partsize=512;

libname mainlib 'c:\temp\';


data mainlib.bigdata;
do ID = 1 to 1500000;
  do _varn=1 to 10;
    varname=cats("Var_",_varn);
    vardata=ranuni(7);
    output;
  end;
end;
run;
data  spdelib.bigdata;
do ID = 1 to 1500000;
  do _varn=1 to 10;
    varname=cats("Var_",_varn);
    vardata=ranuni(7);
    output;
  end;
end;
run;
*These data steps take roughly the same amount of time, around 30 seconds each;

proc transpose data=spdelib.bigdata out=spdelib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 3 to 4 minutes, with 1.5 minutes CPU time;

proc transpose data=mainlib.bigdata out=mainlib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 30 to 45 seconds, with 20 seconds CPU time;
于 2013-08-08T17:00:15.607 回答
1

过去,SPDE 和 proc 比较(不是多线程)存在已知问题,至少到 4.1 版。你用的是什么版本?(可以在“!安装/日志”文件夹中看到)。

这绝对是通过 SAS 支持来提高的,为了“加速”事情,我建议提交带有以下选项的日志:

proc setinit noalias; run; 
proc options; run; 
%put _ALL_; 
options fullstimer msglevel=i;

还:

options spdedebug='DA_TRACEIO_OCR CJNL=Trace.txt';

(CJNL 选项只是将跟踪消息输出路由到文本文件)

同时,您可以利用以下一些 SPD 特定选项:

http://support.sas.com/kb/11/349.html

于 2013-08-08T22:33:45.017 回答
0

当 PROC TRANSPOSE 与压缩数据集的 BY 处理一起使用时,通常会出现此问题。SAS 每次都被迫读取相同的行块并重复解压缩它们,直到所有记录都完全排序。

设置 Compress=No 选项,它将起作用。请参阅下面的日志,一个程序有 Compress=yes 而另一个 Compress=no,前者是 56 分钟 vs 0.5 秒。

OPTIONS COMPRESS=YES;

50         **tranpose from spde to spde;
51         proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52           var metric ;
53           by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
54         run;

NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: Compressing data set SPDELIB.SPDELIB_TO_SPDELIB decreased size by 69.96 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           56:58.54
      user cpu time       52:03.65
      system cpu time     4:03.00
      memory              19028.75k
      OS Memory           34208.00k
      Timestamp           09/16/2019 06:19:55 PM
      Step Count                        9  Switch Count  22476
      Page Faults                       0
      Page Reclaims                     4056
      Page Swaps                        0
      Voluntary Context Switches        142316
      Involuntary Context Switches      5726
      Block Input Operations            88
      Block Output Operations           569200


OPTIONS COMPRESS=NO;

50         **tranpose from spde to spde;
51         proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52           var metric ;
53           by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;

18                                                         The SAS System                           16:04 Monday, September 16, 2019

54         run;

NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           26.73 seconds
      user cpu time       14.52 seconds
      system cpu time     11.99 seconds
      memory              13016.71k
      OS Memory           27556.00k
      Timestamp           09/16/2019 04:13:06 PM
      Step Count                        9  Switch Count  24827
      Page Faults                       0
      Page Reclaims                     2662
      Page Swaps                        0
      Voluntary Context Switches        162653
      Involuntary Context Switches      1678
      Block Input Operations            96
      Block Output Operations           1510040
于 2019-09-17T18:50:05.027 回答