0

原型设计返回了一个内部错误:

虽然此特定设置的目的既无关紧要也不相关,
但编译器完成了以下调试通知,
对此,任何关于避免冲突语法的建议将不胜感激:

<TiO>-IDE-Debug::____________________________________________________

.code.tio.chpl:77: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

Note: This source location is a guess.

Internal errors indicate a bug in the Chapel compiler ("It's us, not you"),
and we're sorry for the hassle.  We would appreciate your reporting this bug -- 
please see http://chapel.cray.com/bugs.html for instructions.  In the meantime,
the filename + line number above may be useful in working around the issue.


(编译器团队显然会对观察到的情况的内部处理有一些额外的兴趣和担忧,这不是本文的主要意图或主题)


代码,直播@ <TiO>-IDE::

/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ use Time;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_RND_GEN: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_LIN_ALG: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_MAT_REC: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_ARR_REC: Timer;
config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64); // real(32); // may've shown some byte-word alignment artifacts
       var   B: [aDomain] real(64); // real(32); // may've shown some byte-word alignment artifacts
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64); // real(32); // OK: must've been set real(64) to avoid /LinearAlgebra.chpl:535: error: type mismatch in assignment from real(64) to real(32)

/* -----------------------------------------------------------------*/ use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();
/* 

   ============================================ */

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {                      /*
                                         <Brad> If the domain/size of the array being returned cannot be described directly in the function prototype,
                                                I believe your best bet at present is to omit any description of the return type and lean on Chapel's type inference machinery
                                                to determine that you're returning an array

                                                >>> https://stackoverflow.com/a/39420337/3666197

                                                */
     var                       arrC: [aDomain] real(64);
                                      /*
                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:77: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

                                                Note: This source location is a guess.

                                                Internal errors indicate a bug in the Chapel compiler ("It's us, not you"),
                                                and we're sorry for the hassle.  We would appreciate your reporting this bug -- 
                                                please see http://chapel.cray.com/bugs.html for instructions.  In the meantime,
                                                the filename + line number above may be useful in working around the issue.

                                                */

 /*  var                       arrC: [{1..arrA.dim( 1 ).length(),       // ..#arrA.dim( 1 ),
                                       1..arrB.dim( 2 ).length()        // ..#arrB.dim( 2 )
                                       }
                                      ] real(64);

                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:49: error: unresolved call '[domain(2,int(64),false)] real(64).dim(1)'
                                                $CHPL_HOME/modules/internal/ChapelArray.chpl:1215: note: candidates are: _domain.dim(d: int)
                                                $CHPL_HOME/modules/internal/ChapelArray.chpl:1218: note:                 _domain.dim(param d: int)

                                                */
  // forall      (row, col) in arrC.domain {    // [ROW:77] reports: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)
     forall      (row, col) in     aDomain {    // [ROW:78] reports: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999) 
        for                              i in arrA.dim( 2 ) do
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64) // int(32) failed:
                                      //      <- config const n_power = 5 // .code.tio.chpl:64: error: unresolved call 'arr_REC_POW([domain(2,int(64),false)] real(64), int(64))'
                  ):    [ D] real(64) {     /* 
                                                <Brad> If the domain/size of the array being returned cannot be described directly in the function prototype,
                                                       I believe your best bet at present is to omit any description of the return type and lean on Chapel's type inference machinery
                                                       to determine that you're returning an array

                                                       >>> https://stackoverflow.com/a/39420337/3666197

                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:56: error: unable to resolve return type of function 'arr_REC_POW'
                                                .code.tio.chpl:56: In function 'arr_REC_POW':
                                                .code.tio.chpl:61: error: called recursively at this point


                                                // The ? operator is called the query operator, and is used to take
                                                // undetermined values like tuple or array sizes and generic types.
                                                // For example, taking arrays as parameters. The query operator is used to
                                                // determine the domain of A. This is uesful for defining the return type,
                                                // though it's not required.

                                                //                  (c) 2017 Ian J. Bertolacci, Ben Harshbarger
                                                // Originally contributed by Ian J. Bertolacci, and updated by 8 contributor(s).

                                                        >>> https://learnxinyminutes.com/docs/chapel/>
                                                */

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
/* 

   ============================================ */

减少<TiO>-IDE(可惜没有代码折叠生产力,就像在其他 IDE 环境中一样。同意 Ben 的观点,根据个人喜好,experiments-under-review 自文档布局可能更具可读性)

仍然

chpl:30: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

chpl:30:存在:

forall      (row, col) in    aDomain {

>>> aClickThrough -with-an-updated-code,没有语法警告但 (-999) @<TiO>-IDE

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {

     var     arrC: [aDomain] real(64);

     forall      (row, col) in    aDomain {
             arrC[row, col]  = 0;
        for                              i in arrA.dim( 2 ) do
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64)
                  ):    [ D] real(64) {

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();

proc mat_REC_POW( matM: [] real(64),
                  n:        int(64)
                  ) {

     if      n < 1 then return                    matM;
     else               return LinearAlgebra.dot( matM, mat_REC_POW( matM, n - 1 ) );
}

/* -----------------------------------------------re-fill-m?[,]-----*/
    Random.fillRandom(  A ); mA = Matrix( A ); // re-fill mA[,]
    Random.fillRandom(  B ); mB = Matrix( B ); // re-fill mB[,]
/* -----------------------------------------------re-fill-m?[,]-----*/

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();
   forall  (row, col)              in mS.domain {
         mS[row, col]  = mat_REC_POW( mA, n_power )[row,col]
                       + mat_REC_POW( mB, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();

/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom() took",           aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
         );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id );

4

1 回答 1

0

任务完成!使用了域助手,递归仍然需要一些调查

非常感谢所有帮助实现这一点的人。
之前的 WIP 临时评论尚未用于教育目的

[OK]:代码现在通过了初始编译器的语法检查,

BLAS+ATLAS但无法按照 v1.15/.16 文档建议工作
(并且正在<TiO>-IDE管理员的帮助下解决)

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {

     var     arrC: [aDomain] real(64);

     forall      (row, col) in    aDomain {
             arrC[row, col]  = 0;
     // for                              i in arrA.dim( 2 ) do          // calling .dim(2) on an array instead of it's domain. Note that dim is only defined on the domain, not the array
        for                              i in arrA.domain.dim( 2 ) do   // calling .dim(2) on an array instead of it's domain. Note that dim is only defined on the domain, not the array
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64)
                  ):    [ D] real(64) {

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();

proc mat_REC_POW( matM: [?Dm] real(64),
                  n:           int(64)
                  ):    [ Dm] real(64) {

 //  if      n < 1 then return                    matM;                                           // chpl:65: error: unable to resolve return type of function 'mat_REC_POW'
     if      n < 1 then return LinearAlgebra.dot( matM, LinearAlgebra.eye( matM.shape[1] ) );     // [DID NOT HELP]: added: so as to help compiler assume the return-type
     else               return LinearAlgebra.dot( matM,       mat_REC_POW( matM, n - 1 ) );       // chpl:70: error: called recursively at this point
}

/* -----------------------------------------------re-fill-m?[,]-----*/
    Random.fillRandom(  A ); mA = Matrix( A ); // re-fill mA[,]
    Random.fillRandom(  B ); mB = Matrix( B ); // re-fill mB[,]
/* -----------------------------------------------re-fill-m?[,]-----*/

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();
   forall  (row, col)              in mS.domain {
         mS[row, col]  = mat_REC_POW( mA, n_power )[row,col]
                       + mat_REC_POW( mB, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();

/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom() took",           aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
         );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id );

BLAS+ATLAS抗议如果试图编译/链接>>> @ <TiO>-IDE,而管理员确认已安装并审查/确认两个模块就位([OK]:<TiO>-IDE站点管理员和布拉德解决 - 两者都值得非常感谢为了这 )

/usr/bin/ld: cannot find -lblas
/usr/bin/ld: cannot find -latlas

<TiO>-IDE管理员 + 布拉德的建议有助于让它发挥作用

单一语言环境下的进程性能,(线程版本ATLAS):

.fillRandom()         took  582125 [us] for A[,], B[,] having 2000000-real(64) elements in total.    
 <SECTION-UNDER-TEST> took 2702530 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE

--print-commands编译器开关报告

gcc    -I/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/include
       -I/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/include
       -DCHPL_TASKS_MODEL_H=\"tasks-qthreads.h\"
       -DCHPL_THREADS_MODEL_H=\"threads-none.h\"
       -DCHPL_WIDE_POINTER_STRUCT
       -DCHPL_JEMALLOC_PREFIX=chpl_je_
       -DCHPL_HAS_GMP
       -Wno-unused
       -Wno-uninitialized
       -Wno-pointer-sign
       -Wno-tautological-compare
       -Wno-stringop-overflow
       -Wno-strict-overflow
       -c
       -o /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp.o
       -I/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/include
       -I.
       -I/opt/chapel//lib/chapel/1.16/runtime/include/localeModels/flat
       -I/opt/chapel//lib/chapel/1.16/runtime/include/localeModels
       -I/opt/chapel//lib/chapel/1.16/runtime/include/comm/none
       -I/opt/chapel//lib/chapel/1.16/runtime/include/comm
       -I/opt/chapel//lib/chapel/1.16/runtime/include/tasks/qthreads
       -I/opt/chapel//lib/chapel/1.16/runtime/include/threads/none
       -I/opt/chapel//lib/chapel/1.16/runtime/include
       -I/opt/chapel//lib/chapel/1.16/runtime/include/qio
       -I/opt/chapel//lib/chapel/1.16/runtime/include/atomics/intrinsics
       -I/opt/chapel//lib/chapel/1.16/runtime/include/mem/jemalloc
       -I/opt/chapel//lib/chapel/1.16/third-party/utf8-decoder
       -I/opt/chapel/share/chapel/1.16/runtime//../build/runtime/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none/include
       -I/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/include
       -I/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/include
       -I/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/include /tmp/chpl-runner-15040.deleteme/_main.c

g++    -L/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/re2/install/linux64-gnu-native/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/re2/install/linux64-gnu-native/lib
       -o /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp
       -L/opt/chapel//lib/chapel/1.16/runtime/lib/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none
       /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp.o
       /opt/chapel//lib/chapel/1.16/runtime/lib/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none/main.o
       -lchpl
       -lm
       -lblas -L/usr/lib64/atlas
       -ltatlas
       -lgmp
       -lchpl
       -lqthread -L/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/lib
       -ljemalloc
       -lhwloc
       -lm
       -lre2
       -lpthread

最后但同样重要的是,让我分享
一些
关于性能数据设置开销和一系列实验可探索边界的最后评论

虽然最大的 [PAR]权力超出了可测试性的范围(在公共赞助的基础设施上由于明显的原因在管理上不可用<TiO>-IDE),并且可能会在更现实的计算设备上得到进一步调查,例如那些在 Cray 内部可用资源中可用的设备并且被 Cray 的 Chapel-initiative 使用,从语言表达能力和语言实现的实际状态中获益的本质令人印象深刻。

一些需要调查的其他问题可能是:


致谢

再次感谢 Dennis @ <TiO>-IDEsupport 和 Brad @ Cray +团队一切顺利,推动和扩展这个伟大的软件项目仍然越来越好。


writeln(// "______________________________________ChplCode.<-lsatlas> implementation___________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE" );
           "______________________________________ChplCode.<-ltatlas> implementation_________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE" );
        /* 
           As the experimentally collected performance-data show and support below,
           there is about a constant,
           Matrix scale-invariant,
           additional overhead of ~ +440 ~ +500 [ms]
           for
           a THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE,
           believed to be
           associated with a setup of a thread-pool & al processing pre-arrangements,
           which
           ought be accounted for in
           an overhead-aware Amdahl Law formulation for pre-validations of a feasible choice
           whether a [PAR], using -ltatlas
           or      a [SEQ], using -lsatlas support for the [LinearAlgebra] module implementation
           will yield faster processing times.
           */

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      2600;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();
writeln( ".fillRandom()        took ",                                     aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ),
         " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();
/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom()        took ", aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
       "\n<SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
   // ,"\n<SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
   // ,"\n<SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
        );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id,                                            "\n                                having a name of <_",
                                                                    Locales[i].name,                                        "_>\n                                having { REAL:"              ,
                                                                 // Locales[i].numPUs( logical = false, accessible =  true ),   " | VIRT:"                       ,
                                                                    Locales[i].numPUs(           false,               true ),   " | VIRT:"                       ,
                                                                 // Locales[i].numPUs( logical =  true, accessible =  true ),   " | TEOR:"                       ,
                                                                    Locales[i].numPUs(            true,               true ),   " | TEOR:"                       ,
                                                                 // Locales[i].numPUs( logical =  true, accessible = false ),   " } PUnits"                      ,
                                                                    Locales[i].numPUs(            true,              false ),   " } PUnits\n                                having max ",
                                                                    Locales[i].maxTaskPar,                                      " 'just'-[CONCURRENT]-tasks\n                                having max ",
                                                                    Locales[i].callStackSize,                                   "-callStackSIZE."
                                                                    );

/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| RES:

.fillRandom()        took      560773 [us] for A[,], B[,] having  2000000-real(64) elements in total. <BEST-CASE>s IN SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took     2521920 [us] for A[,], B[,] having  8000000-real(64) elements in total.
.fillRandom()        took     2717450 [us] for A[,], B[,] having  9680000-real(64) elements in total.
.fillRandom()        took     3630820 [us] for A[,], B[,] having 11520000-real(64) elements in total.

.fillRandom()        took     4429820 [us] for A[,], B[,] having 13520000-real(64) elements in total.
.fillRandom()        took     4048440 [us] for A[,], B[,] having 13520000-real(64) elements in total. ( IN THREADED-MODE ) was faster, but not systematically

.fillRandom()        took     4793110 [us] for A[,], B[,] having 15680000-real(64) elements in total. 
.fillRandom()        took     5055060 [us] for A[,], B[,] having 15680000-real(64) elements in total. ( IN THREADED-MODE )

.fillRandom()        took     5630540 [us] for A[,], B[,] having 18000000-real(64) elements in total.



<TiO>-IDE-LocaleSpace is: {0..0} massive. Code is executing [here], being Locale 0
          Locale #0's ID is: 0
                                having a name of <_tio2_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits
                                having max 4 'just'-[CONCURRENT]-tasks
                                having max 8388608-callStackSIZE.

<SECTION-UNDER-TEST> took    15110000 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2000,2000] on <TiO>-IDE <BEST-CASE>s IN SERIAL-MODE
<SECTION-UNDER-TEST> took    17880300 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2200,2200] on <TiO>-IDE
<SECTION-UNDER-TEST> took    25094100 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2400,2400] on <TiO>-IDE
<SECTION-UNDER-TEST> took    31550900 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2600,2600] on <TiO>-IDE
<SECTION-UNDER-TEST> took    32996500 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2600,2600] on <TiO>-IDE
<SECTION-UNDER-TEST> took    34390400 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2800,2800] on <TiO>-IDE
<SECTION-UNDER-TEST> KILL-ed                                               for [3000,3000] on <TiO>-IDE, having 18,000,000-real(64) elements .fillRandom()-ed in ~ 5.6 [s] time.

______________________________________ChplCode.<-lsatlas> implementation___________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
                                                  ^________________________________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took 5.60773e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.62970e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.64366e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.70291e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.75086e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.85121e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.25645e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.77903e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.96932e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.98700e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.

<SECTION-UNDER-TEST> took 2.06538e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.07902e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.08725e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.12497e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.13071e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.22075e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.28035e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.32674e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.33844e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.35908e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE


______________________________________ChplCode.<-ltatlas> implementation_________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
                                                  ^______________________________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took 5.68652e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.73797e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.74911e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.81389e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.87079e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.92182e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.20989e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.62606e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.69875e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.71270e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.

<SECTION-UNDER-TEST> took 2.53459e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.57695e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.59966e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.61859e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.70356e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.76325e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.85588e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.92058e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.92204e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.97887e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE


<TiO>-IDE-LocaleSpace is: {0..0} massive.             +--------------------------------<_tio2_>
Code is executing [here], being Locale 0              V
          Locale #0's ID is: 0, having a name of <_tio2_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits,
                                having max 4 just-[CONCURENT]-tasks,
                                having 8388608-callStackSIZE.

                                                      +--------------------------------<_tio3_>
                                                      V
...                             having a name of <_tio3_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits
                                having max 4 'just'-[CONCURRENT]-tasks
                                having max 8388608-callStackSIZE.


*/
于 2017-08-19T15:48:15.253 回答