multithreading - Haskell 中的半显式并行

Question

我正在阅读 Haskell 中的半显式并行性，并感到有些困惑。

      标准杆 :: a -> b -> b

人们说这种方法允许我们通过并行评估 Haskell 程序的每个子表达式来自动进行并行化。但是这种方法有以下缺点：

1）它创建了太多的小项目，无法有效地安排。据我了解，如果你对 Haskell 程序的每一行都使用 par 函数，它会创建太多线程，而且根本不实用。那正确吗？

2）使用这种方法，并行性受到源程序中数据依赖性的限制。如果我理解正确，这意味着每个子表达式都必须是独立的。就像，在 par 函数中，a 和 b 必须是独立的。

3) Haskell 运行时系统不一定创建线程来计算表达式 a 的值。相反，它会创建一个火花，它有可能在与父线程不同的线程上执行。

所以，我的问题是：最后运行时系统会创建一个线程来计算 a 吗？或者如果需要表达式a来计算表达式b，系统会创建一个新线程来计算a？否则，它不会。这是真的？

我是 Haskell 的新手，所以也许我的问题对你们所有人来说仍然是基本的。感谢您的回答。

score 8 · Accepted Answer

The par combinator you mention is part of the Glasgow parallel Haskell (GpH) which implements semi-explicit parallelism, which however means it is not fully implicit and hence does not provide automatic parallelisation. The programmer still needs to identify subexperssions deemed worthwhile executing in parallel to avoid the issue you mention in 1).

Moreover, the annotation is not prescriptive (as e.g. pthread_create in C or forkIO in Haskell) but advisory, that means the runtime system finally decides whether to evaluate the subexpressions in parallel or not. This provides additional flexibility and a way for dynamic granularity control. Additionally, so-called Evaluation Strategies have been designed to abstract over par and pseq and to separate specification of coordination from computation. For instance, a strategy parListChunk allows to chunk a list and force it to weak head normal form (this is a case where some strictness is needed).

2) Parallelism is limited by data dependencies in the sense that the computation defines the way the graph is reduced and which computation is demanded at which point. It is not true that every sub-expression must be independent. For instance E1 par E2 returns the result of E2, it means to be useful, some part of E1 needs to be used in E2 and hence E2 depends on E1.

3) The picture is slightly confused here because of the GHC-specific terminology. There are Capabilities (or Haskell Execution Contexts) which implement parallel graph reduction and maintain a spark pool and a thread pool each. Usually there is one Capability per core (can be thought of as OS threads). On the other end of the continuum there are sparks, which are basically pointers to parts of the graph that have not been evaluated yet (thunks). And there are threads (actually sort of tasks or work units), so that to be evaluated in parallel a spark needs to be turned into a thread (which has a so called thread state object that contains the necessary execution environment and allows a thunk to be evaluated in parallel). A thunk may depend on results of other thunks and blocks until these results arrive. These threads are much more lightweight than OS threads and are being multiplexed onto the available Capablities.

So, in summary, a runtime will not even neccesarily create a lightweight thread to evaluate a sub-expression. By the way, random work-stealing is used for load-balancing.

This is a very high-level approach to parallelism and avoids race conditions and deadlocks by design. The synchronisation is implicitly mediated through graph reduction. A nice position statement discusses further Why Parallel Functional Programming Matters. For more information on the abstract machine behind the scenes have a look at Stackless Tagless G-Machine and checkout the notes on Haskell Execution Model on GHC wiki (usually the most up-to-date documentation alongside the source code).

score 3 · Accepted Answer

是的，你是对的。通过为要计算的每个表达式创建火花，您不会获得任何收益。你会得到方法，太多的火花。试图管理这就是Data Parallel Haskell的意义所在。DPH 是一种将嵌套计算分解为大小合适的块的方法，然后可以并行计算。请记住，这仍然是一项研究工作，可能还没有为主流消费做好准备。
再一次，你是对的。如果 a 依赖于 b，则您必须计算 b 所需的 a 才能开始计算 b。
是的。与某些替代方案相比，线程实际上具有相当高的开销。火花有点像 thunk，只是它们可以独立于时间计算。

不，RTS 不会创建线程来计算 a。您可以决定 RTS 应该运行多少个线程（+RTS -N6六个线程），并且它们将在程序运行期间保持活动状态。

par只会产生火花。火花不是线程。火花占用一个工作池，调度程序执行工作窃取——即当一个线程空闲时，它从池中获取一个火花并计算它。

multithreading - Haskell 中的半显式并行

2 回答 2

Related

Reference