multithreading - How does a Java virtual machine implement the "happens-before" memory model?

Question

Java's memory model is based on "happens-before" relationship that enforces rules but also allows for optimization in the virtual machine's implementation in terms of cache invalidation.

For example in the following case:

// thread A
private void method() {
   //code before lock
   synchronized (lockA) {
       //code inside
   }
}

// thread B
private void method2() {
   //code before lock
   synchronized (lockA) {
       //code inside
   }
}

// thread B
private void method3() {
   //code before lock
   synchronized (lockB) {
       //code inside
   }
}

if thread A calls method() and thread B tries to acquire lockA inside method2(), then the synchronization on lockA will require that thread B observes all changes that thread A made to all of its variables prior to releasing its lock, even the variables that were changed in the "code before lock" section.

On the other hand, method3() uses another lock and doesn't enforce a happens-before relatation. This creates opportunity for optimization.

My question is how does the virtual machine implements those complex semantics? Does it avoid a full flush of the cache when it is not needed?

How does it track which variables did change by which thread at what point, so that it only loads from memory just the cache-lines needed?

score 4 · Accepted Answer

您期望对 JVM 的思考过于高级。内存模型有意只描述必须保证的内容，而不是如何实现。某些架构具有完全不需要刷新的连贯缓存。尽管如此，当涉及到禁止对超过某个点的读取和/或写入重新排序时，可能需要采取一些措施。

但在所有情况下，这些影响都是全局性的，因为对所有读取和写入都做出了保证，而不取决于建立之前发生关系的特定构造。回想一下，在释放特定锁之前发生的所有写入都发生在获取相同锁之后的所有读取之前。

JVM 根本不处理发生前的关系。它通过解释（执行）代码或为其生成本机代码来处理代码。这样做时，它必须通过插入屏障或刷新以及不重新排序超出这些屏障的读取或写入指令来遵守内存模型。此时，它通常会孤立地考虑代码，而不是查看其他线程在做什么。这些刷新或障碍的影响总是全局的。

但是，具有全局影响不足以建立先发生关系。只有当一个线程保证在另一个线程保证（重新）读取值之前提交所有写入时，这种关系才存在。当两个线程在不同的对象上同步或获取/释放不同的锁时，这种顺序不存在。

如果是volatile变量，您可以评估变量的值以查明其他线程是否已写入预期值并因此提交了写入。在synchronized块的情况下，互斥会强制执行排序。因此，在synchronized块内，线程可以检查监视器保护的所有变量以评估状态，这应该是synchronized块内使用同一监视器的先前更新的结果。

由于这些影响是全局的，一些开发人员被误导认为在不同的锁上同步是可以的，只要关于时间顺序的假设是“合理的”，但是这样的程序代码必须被认为是错误的，因为它依赖于一个特定的实现，尤其是它的简单性。

最近的 JVM 所做的一件事是考虑纯粹本地的对象，即任何其他线程从未见过的对象，在同步它们时无法建立发生前的关系。因此，在这些情况下可以忽略同步的影响。我们可以期待未来会有更多的优化……</p>

score 1 · Accepted Answer

它如何跟踪哪个变量在什么时候由哪个线程更改，以便它只从内存中加载所需的缓存行？

不，这不是现代 CPU 的工作方式。

在您可能会看到运行的多线程 Java 代码复杂到足以出现此类问题的每个平台上，缓存一致性是在硬件中实现的。高速缓存行可以直接从一个高速缓存传输到另一个高速缓存，而无需经过主存。事实上，如果数据每次被放在一个核心上并在另一个核心上拾取时都必须通过缓慢的主内存，那将是非常糟糕的。所以缓存直接相互通信。

当代码修改内存地址时，该内核的缓存将获得该内存地址的独占所有权。如果另一个内核想要读取该内存地址，缓存通常会通过直接通信共享内存地址。如果任何一个核心想要修改共享数据，它必须使另一个线程缓存中的数据无效。

因此，这些缓存由硬件管理，并有效地使它们在软件级别上不可见。

但是，CPU 有时确实有预取或后写（尚未在缓存中）。这些只需要使用内存屏障指令。内存屏障完全在 CPU 内部运行，以防止跨屏障重新排序、延迟或提前执行内存操作。CPU 知道哪些内存操作被延迟或提前执行，因此代码不必跟踪它。

multithreading - How does a Java virtual machine implement the "happens-before" memory model?

2 回答 2

Related

Reference