erlang - 从头开始实施 pmap。为什么我的实施缓慢？

Question

我是 Erlang 的新手，所以为了培训我尝试从头开始实现标准功能。我试图从列表模块创建 map/2函数的并行实现。但是我的实现工作非常缓慢。如果我在实施过程中犯了任何主要错误，您能否指出我：

在此处输入图像描述

-module( my_pmap ).
-export([ pmap/2 ]).
-export([ map/4, collect/3 ]).

map( F, Value, Indx, SenderPid ) ->
        SenderPid ! { Indx, F( Value ) }.

pmap( F, List ) ->
        CollectorPid = spawn_link( my_pmap, collect, [ length( List ), [], self() ] ),
        lists:foldl(
                fun( X, Indx ) ->
                        spawn_link( my_pmap, map, [ F, X, Indx, CollectorPid ] ),
                        Indx + 1
                end,
                1,
                List ),
        Mapped =
                receive
                        { collected, M } ->
                                M
                end,
        Sorted = lists:sort(
                        fun( { Indx1, _ }, { Indx2, _ } ) ->
                                Indx1 < Indx2
                        end,
                        Mapped ),
        [ Val || { _Indx, Val } <- Sorted ].

collect( 0, List, SenderPid ) ->
        SenderPid ! { collected, List };
collect( N, List, SenderPid ) when N > 0 ->
        receive
                Mapped ->
                        collect( N - 1, [ Mapped | List ], SenderPid )
        end.

这是测试结果：

1> c(my_pmap).
{ok,my_pmap}
2> timer:tc( my_pmap, pmap, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ).
{137804,
 [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736,
  28561,38416,50625,65536,83521,104976,130321,160000,194481,
  234256,279841,331776,390625,456976,531441|...]}
3> timer:tc( lists, map, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ).   
{44136,
 [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736,
  28561,38416,50625,65536,83521,104976,130321,160000,194481,
  234256,279841,331776,390625,456976,531441|...]}

正如您可能已经看到的 0,137804 秒。与0,044136 秒相比。

谢谢

score 5 · Accepted Answer

评论是正确的。问题是产卵过程很便宜，但它确实有成本。将一个数字乘以三倍非常快，并且产生新进程的开销会影响您的性能。

将列表划分为片段并在单独的进程中处理每个片段可能会更快。如果您知道您有 8 个核心，则可以尝试将其拆分为 8 个片段。pmap 之类的东西可以在 Erlang 中实现，但它不是 Erlang 的强项。像 Haskell GHC 运行时这样的系统有sparks，这是一个更好的工具来实现像这样的细粒度并行。此外，像这样的乘法显然是 SSE 或 GPU 中的 SIMD 指令的候选者。Erlang 对此也没有解决方案，但同样，GHC 有accelerate处理repa这种情况的库。

另一方面，您可以通过简单地使用进程来处理提示的几个片段来在 Erlang 中获得良好的加速。另请注意，由于通信开销，并行计算通常在低 N（如 10000）时表现不佳。您需要更大的问题才能获得收益。

erlang - 从头开始实施 pmap。为什么我的实施缓慢？

1 回答 1

Related

Reference