java - 关于不变性影响的经验数据？

Question

今天在课堂上，我的教授正在讨论如何组织一堂课。该课程主要使用Java，我比老师有更多的Java经验（他来自C++背景），所以我提到在Java中应该支持不变性。我的教授让我证明我的回答是正确的，我给出了我从 Java 社区听到的理由：

安全性（尤其是螺纹）
减少对象数量
允许某些优化（尤其是垃圾收集器）

教授对我的说法提出了质疑，他说他希望看到对这些好处的一些统计测量。我引用了大量轶事证据，但即使我这样做了，我也意识到他是对的：据我所知，对于不变性是否真的提供了它在现实世界代码中所承诺的好处，还没有实证研究。我知道这是从经验中得出的，但其他人的经验可能会有所不同。

所以，我的问题是，是否有任何关于真实世界代码中不变性影响的统计研究？

score 5 · Accepted Answer

我会指出有效 Java 中的第 15 项。不变性的价值在于设计（它并不总是合适的——它只是一个很好的初步近似）并且从统计的角度很少争论设计偏好，但我们已经看到可变对象（日历、日期）已经变得非常糟糕，并且严重的替代品（JodaTime，JSR-310）选择了不变性。

score 2 · Accepted Answer

So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?

I'd argue that your professor is just being obtuse -- not necessarily intentionally or even a bad thing. Its just that the question is too vague. Two real problems with the question:

"Statistical studies on the effect of [x]" doesn't really mean anything if you don't specify what kind of measurements you're looking for.
"Real-world code" doesn't really mean anything unless you state a specific domain. Real world code includes scientific computing, game development, blog engines, automated proof generators, stored procedures, operating system kernals, etc

For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:

The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.
Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).
So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.

Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.

Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".

At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.

It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.

score 2 · Accepted Answer

在我看来，Java 中不变性的最大优势是简单。如果对象的状态不能改变，那么推理对象的状态就会变得简单得多。这在多线程环境中当然更为重要，但即使在简单的线性单线程程序中，它也可以使事情更容易理解。

有关更多示例，请参阅此页面。

score 1 · Accepted Answer

我认为您的教授过于固执（可能是故意的，以促使您更充分地理解）。实际上，不变性的好处不在于编译器可以通过优化做什么，而在于我们人类更容易阅读和理解。保证在创建对象时设置并且保证以后不会更改的变量比现在这个值但以后可能设置为其他值的变量更容易理解和推理。

对于线程尤其如此，因为当语言保证不可能发生此类修改时，您无需担心处理器缓存和监视器以及避免并发修改所带来的所有样板文件。

一旦你将不变性的好处表达为“代码更容易遵循”，要求对生产力提高的经验测量相对于“更容易遵循”来说感觉有点愚蠢。

另一方面，编译器和热点可能会基于知道一个值永远不会改变来执行某些优化 - 就像你一样，我感觉这会发生并且是一件好事，但我不确定细节。更有可能会有关于可能发生的优化类型的经验数据，以及生成的代码有多快。

score 1 · Accepted Answer

不要和教授争论。你没有任何收获。
这些是开放性问题，例如动态与静态类型。由于各种原因，我们有时认为涉及不可变数据的函数式技术更好，但到目前为止，这主要是风格问题。

score 1 · Accepted Answer

你会客观地衡量什么？GC 和对象计数可以用同一程序的可变/不可变版本来衡量（尽管这将是主观的，所以这是一个非常薄弱的论点）。我无法想象您如何衡量线程错误的消除，除了与一个真实世界的生产应用程序示例相比，该示例受到通过添加不变性修复的间歇性问题的困扰。

score 0 · Accepted Answer

不可变对象允许代码通过共享引用来共享对象的值。然而，可变对象具有标识，即想要共享对象标识的代码通过共享引用来做到这一点。这两种共享在大多数应用程序中都是必不可少的。如果没有可用的不可变对象，则可以通过将值复制到新对象或这些值的预期接收者提供的对象中来共享值。让我没有可变对象要困难得多。可以通过说来有点“伪造”可变对象stateOfUniverse = stateOfUniverse.withSomeChange(...)，但要求stateOfUniverse在其withSomeChange方法正在运行[排除任何类型的多线程]。此外，例如，如果有人试图跟踪卡车车队，并且部分代码对一辆特定的卡车感兴趣，则该代码必须始终在卡车表中查找该卡车，只要它可能已更改.

更好的方法是将宇宙细分为实体和值。实体将具有可变特征，但身份不可变，因此Truck即使卡车本身改变位置、装载和卸载货物等，例如类型的存储位置也可以继续识别同一辆卡车。值通常不会具有特定的身份，但将具有不可变的特征。ATruck可能将其位置存储为 type WorldCoordinate。WorldCoordinate只要存在任何对它的引用，代表 45.6789012N 98.7654321W的 A就会继续存在；如果在该位置的卡车稍微向北移动，它将创建一个新WorldCoordinate的来表示 45.6789013N 98.7654321W，放弃旧的，并存储对该新的引用。

当所有东西都封装了一个不可变的值或一个不可变的标识，并且当应该具有不可变标识的事物是可变的时，通常最容易对代码进行推理。如果不想在变量之外使用任何可变对象stateOfUniverse，则更新卡车的位置将需要类似：

ImmutableMapping<int,Truck> trucks = stateOfUniverse.getTrucks();
Truck myTruck = trucks.get(myTruckId);
myTruck = myTruck.withLocation(newLocation);
trucks = trucks.withItem(myTruckId,myTruck);
stateOfUniverse = stateOfUniverse.withTrucks(trucks);

但是推理该代码将比以下更困难：

myTruck.setLocation(newLocation);

score 0 · Accepted Answer

不变性对于值对象来说是一件好事。但是其他事情呢？想象一个创建统计数据的对象：

Stats s = new Stats ();

... some loop ...
     s.count ();

s.end ();
s.print ();

应该打印“已处理 536.21 行/秒”。你打算如何count()使用不可变实现？即使您为计数器本身使用不可变的值对象，s也不能是不可变的，因为它必须替换自身内部的计数器对象。唯一的出路是：

     s = s.count ();

这意味着复制s循环中每一轮的状态。虽然可以做到这一点，但它肯定不如增加内部计数器有效。

此外，大多数人不会正确使用此 API，因为他们希望count()修改对象的状态而不是返回新的状态。所以在这种情况下，它会产生更多的错误。

score 0 · Accepted Answer

正如其他评论所声称的那样，收集关于不可变对象优点的统计数据将非常非常困难，因为几乎不可能找到控制案例——在各方面都相似的软件应用程序对，除了一个使用不可变对象对象和其他没有。（几乎在每一种情况下，我都会声称该软件的一个版本是在另一个版本之后编写的，并且从第一个版本中吸取了很多教训，因此性能的改进将有很多原因。）任何考虑到这一点的有经验的程序员片刻应该意识到这一点。我认为你的教授试图转移你的建议。

同时，支持不变性很容易提出令人信服的论据，至少在 Java 中，可能在 C# 和其他 OO 语言中。正如 Yishai 所说，Effective Java很好地证明了这个论点。我书架上的《 Java Concurrency in Practice》也是如此。

java - 关于不变性影响的经验数据？

9 回答 9

Related

Reference