除了手动添加 Plinq 扩展语句 AsParallel() 之外,编译器不能自动为我们解决这个问题吗?如果代码支持,是否有任何您特别不希望并行化的示例?
3 回答
好消息是编译器研究人员说我们真的很接近自动并行化编译器。坏消息是他们已经说了五十年了。
C# 等不纯语言的问题通常是没有足够的并行性。在不纯的语言中,人类很难,而且程序几乎不可能弄清楚两段代码是否以及如何相互或与环境交互。
在一个纯粹的、引用透明的语言中,你会遇到相反的问题:一切都是可并行的,但它通常没有意义,因为调度线程比仅仅执行该死的事情要花费更长的时间。
例如,在一个纯粹的、引用透明的、函数式语言中,如果我有这样的东西:
if a <= b + c
foo
else
bar
end
我可以启动五个线程并并行计算a
, b
, c
, foo
and ,然后我计算 the然后 the最后,我计算 the ,这仅仅意味着丢弃任何一个or的结果,它们都已经计算过了。(请注意,这取决于功能性:在不纯的语言中,您不能简单地计算 an 的两个分支然后丢弃一个。如果两者都打印一些东西怎么办?您将如何“取消打印”它?)bar
+
<=
if
foo
bar
if
但是如果a
、b
、c
和真的很便宜foo
,bar
那么这五个线程的开销将远远大于计算本身。
I do research in the area of automatic parallelization. This is a very tricky topic that is the subject of many Ph.D. dissertations. Static code analysis and automatic parallelization has been done with great success for languages such as Fortran, but there are limits to the compiler's ability to analyze inter-regional dependencies. Most people would not be willing to sacrifice the guarantee of code correctness in exchange for potential parallel performance gains, and so the compiler must be quite conservative in where it inserts parallel markers.
Bottom line: yes, a compiler can parallelize code. But a human can often parallelize it better, and having the compiler figure out where to put the markers can be very, very, very tricky. There are dozens of research papers available on the subject, such as the background work for the Mitosis parallelizing compiler or the D-MPI work.
Auto parallelization is trickier than it might initially appear. It's that "if the code supports it" part that gets you. Consider something like
counter = 0
f(i)
{
counter = counter + 1
return i + counter
}
result = map(f, {1,2,3,4})
If the compiler just decides to parallelize map here, you could get different results on each run of the program. Granted, it is obvious that f doesn't actually support being used in this manner because it has a global variable. However if f is in a different assembly the compiler can't know that it is unsafe to parallelize it. It could introspect the assembly f is in at runtime and decide then, but then it becomes a question of "is the introspection fast enough and the dynamic parallelization fast enough to not negate the benefits from doing this?" And sometimes it may not be possible to introspect, f
could be a P/Invoked function, that might actually be perfectly suited to parallelization, but since the runtime/compiler has no way of knowing it has to assume it can't be. This is just the tip of the iceberg.
In short, it is possible, but difficult, so the trade-off between implementing the magic and the benefits of the magic were likely skewed too far in the wrong direction.