这之间的性能有什么区别吗
synchronized void x() {
y();
}
synchronized void y() {
}
和这个
synchronized void x() {
y();
}
void y() {
}
这之间的性能有什么区别吗
synchronized void x() {
y();
}
synchronized void y() {
}
和这个
synchronized void x() {
y();
}
void y() {
}
是的,除非 JVM 内联对 的调用,否则会产生额外的性能成本,y()
现代 JIT 编译器将在相当短的时间内完成。首先,考虑一下您所展示的在y()
课堂外可见的案例。在这种情况下,JVM 必须在进入时进行检查,y()
以确保它可以进入对象上的监视器;当调用来自 时,此检查将始终成功x()
,但不能跳过,因为调用可能来自类外的客户端。这种额外的检查会产生少量费用。
此外,考虑 是 的y()
情况private
。在这种情况下,编译器仍然没有优化掉同步;请参阅以下空的反汇编y()
:
private synchronized void y();
flags: ACC_PRIVATE, ACC_SYNCHRONIZED
Code:
stack=0, locals=1, args_size=1
0: return
根据规范的定义synchronized
,每个进入synchronized
块或方法对对象执行锁定动作,离开执行解锁动作。在锁定计数器降为零之前,没有其他线程可以获取该对象的监视器。据推测,某种静态分析可以证明一个private synchronized
方法只能从其他synchronized
方法中调用,但 Java 的多源文件支持充其量只会使这种方法变得脆弱,甚至忽略反射。这意味着 JVM 在进入时仍必须增加计数器y()
:
调用
synchronized
方法时的监控入口和返回时的监控退出,由 Java 虚拟机的方法调用和返回指令隐式处理,就像使用了monitorenter和monitorexit一样。
@AmolSonawane 正确地指出,JVM 可以在运行时通过执行锁粗化来优化此代码,本质上是内联该y()
方法。在这种情况下,在 JVM 决定执行 JIT 优化之后,从x()
toy()
调用不会产生任何额外的性能开销,但是y()
从任何其他位置直接调用当然仍然需要单独获取监视器。
Benchmark Mean Mean error Units
c.a.p.SO18996783.syncOnce 21.003 0.091 nsec/op
c.a.p.SO18996783.syncTwice 20.937 0.108 nsec/op
=> 无统计学差异。
查看生成的程序集显示已执行锁定粗化并y_sync
已内联,x_sync
尽管它是同步的。
完整结果:
Benchmarks:
# Running: com.assylias.performance.SO18996783.syncOnce
Iteration 1 (5000ms in 1 thread): 21.049 nsec/op
Iteration 2 (5000ms in 1 thread): 21.052 nsec/op
Iteration 3 (5000ms in 1 thread): 20.959 nsec/op
Iteration 4 (5000ms in 1 thread): 20.977 nsec/op
Iteration 5 (5000ms in 1 thread): 20.977 nsec/op
Run result "syncOnce": 21.003 ±(95%) 0.055 ±(99%) 0.091 nsec/op
Run statistics "syncOnce": min = 20.959, avg = 21.003, max = 21.052, stdev = 0.044
Run confidence intervals "syncOnce": 95% [20.948, 21.058], 99% [20.912, 21.094]
Benchmarks:
com.assylias.performance.SO18996783.syncTwice
Iteration 1 (5000ms in 1 thread): 21.006 nsec/op
Iteration 2 (5000ms in 1 thread): 20.954 nsec/op
Iteration 3 (5000ms in 1 thread): 20.953 nsec/op
Iteration 4 (5000ms in 1 thread): 20.869 nsec/op
Iteration 5 (5000ms in 1 thread): 20.903 nsec/op
Run result "syncTwice": 20.937 ±(95%) 0.065 ±(99%) 0.108 nsec/op
Run statistics "syncTwice": min = 20.869, avg = 20.937, max = 21.006, stdev = 0.052
Run confidence intervals "syncTwice": 95% [20.872, 21.002], 99% [20.829, 21.045]
为什么不测试呢!?我运行了一个快速基准测试。该benchmark()
方法在循环中调用以进行预热。这可能不是非常准确,但它确实显示了一些一致的有趣模式。
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
System.out.println("+++++++++");
benchMark();
}
}
static void benchMark() {
Test t = new Test();
long start = System.nanoTime();
for (int i = 0; i < 100; i++) {
t.x();
}
System.out.println("Double sync:" + (System.nanoTime() - start) / 1e6);
start = System.nanoTime();
for (int i = 0; i < 100; i++) {
t.x1();
}
System.out.println("Single sync:" + (System.nanoTime() - start) / 1e6);
}
synchronized void x() {
y();
}
synchronized void y() {
}
synchronized void x1() {
y1();
}
void y1() {
}
}
结果(最后 10 个)
+++++++++
Double sync:0.021686
Single sync:0.017861
+++++++++
Double sync:0.021447
Single sync:0.017929
+++++++++
Double sync:0.021608
Single sync:0.016563
+++++++++
Double sync:0.022007
Single sync:0.017681
+++++++++
Double sync:0.021454
Single sync:0.017684
+++++++++
Double sync:0.020821
Single sync:0.017776
+++++++++
Double sync:0.021107
Single sync:0.017662
+++++++++
Double sync:0.020832
Single sync:0.017982
+++++++++
Double sync:0.021001
Single sync:0.017615
+++++++++
Double sync:0.042347
Single sync:0.023859
看起来第二个变化确实稍微快了一点。
可以在下面找到测试(您必须猜测某些方法的作用,但并不复杂):
它使用 100 个线程对它们进行测试,并在其中 70% 完成后开始计算平均值(作为预热)。
它在最后打印一次。
public static final class Test {
final int iterations = 100;
final int jiterations = 1000000;
final int count = (int) (0.7 * iterations);
final AtomicInteger finishedSingle = new AtomicInteger(iterations);
final AtomicInteger finishedZynced = new AtomicInteger(iterations);
final MovingAverage.Cumulative singleCum = new MovingAverage.Cumulative();
final MovingAverage.Cumulative zyncedCum = new MovingAverage.Cumulative();
final MovingAverage singleConv = new MovingAverage.Converging(0.5);
final MovingAverage zyncedConv = new MovingAverage.Converging(0.5);
// -----------------------------------------------------------
// -----------------------------------------------------------
public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < test.iterations; i++) {
test.benchmark(i);
}
Threads.sleep(1000000);
}
// -----------------------------------------------------------
// -----------------------------------------------------------
void benchmark(int i) {
Threads.async(()->{
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
a();
}
long elapsed = System.nanoTime() - start;
int v = this.finishedSingle.decrementAndGet();
if ( v <= count ) {
singleCum.add (elapsed);
singleConv.add(elapsed);
}
if ( v == 0 ) {
System.out.println(elapsed);
System.out.println("Single Cum:\t\t" + singleCum.val());
System.out.println("Single Conv:\t" + singleConv.val());
System.out.println();
}
});
Threads.async(()->{
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
az();
}
long elapsed = System.nanoTime() - start;
int v = this.finishedZynced.decrementAndGet();
if ( v <= count ) {
zyncedCum.add(elapsed);
zyncedConv.add(elapsed);
}
if ( v == 0 ) {
// Just to avoid the output not overlapping with the one above
Threads.sleep(500);
System.out.println();
System.out.println("Zynced Cum: \t" + zyncedCum.val());
System.out.println("Zynced Conv:\t" + zyncedConv.val());
System.out.println();
}
});
}
synchronized void a() { b(); }
void b() { c(); }
void c() { d(); }
void d() { e(); }
void e() { f(); }
void f() { g(); }
void g() { h(); }
void h() { i(); }
void i() { }
synchronized void az() { bz(); }
synchronized void bz() { cz(); }
synchronized void cz() { dz(); }
synchronized void dz() { ez(); }
synchronized void ez() { fz(); }
synchronized void fz() { gz(); }
synchronized void gz() { hz(); }
synchronized void hz() { iz(); }
synchronized void iz() {}
}
MovingAverage.Cumulative add 基本上是(以原子方式执行):average = (average * (n) + number) / (++n);
MovingAverage.Converging 您可以查找但使用另一个公式。
50 秒预热后的结果:
有:抖动-> 1000000
Zynced Cum: 3.2017985649516254E11
Zynced Conv: 8.11945143126507E10
Single Cum: 4.747368153507841E11
Single Conv: 8.277793176290959E10
那是纳秒的平均值。这真的没什么,甚至表明zynced 花费的时间更少。
With: jiterations -> original * 10 (需要更长的时间)
Zynced Cum: 7.462005651190714E11
Zynced Conv: 9.03751742946726E11
Single Cum: 9.088230941676143E11
Single Conv: 9.09877020004914E11
正如您所看到的,结果表明这实际上并没有太大的区别。zynced的最后 30% 完成的平均时间实际上较短。
每个线程(迭代次数 = 1)和 jiterations = original * 100;
Zynced Cum: 6.9167088486E10
Zynced Conv: 6.9167088486E10
Single Cum: 6.9814404337E10
Single Conv: 6.9814404337E10
在同一个线程环境中(删除 Threads.async 调用)
有:jiterations -> original * 10
Single Cum: 2.940499529542545E8
Single Conv: 5.0342450600964054E7
Zynced Cum: 1.1930525617915475E9
Zynced Conv: 6.672312498662484E8
这里的 zynced 似乎比较慢。大约 10 个。造成这种情况的原因可能是因为每次都被 zynced 追赶,谁知道呢。没有精力尝试反向。
最后一次测试运行:
public static final class Test {
final int iterations = 100;
final int jiterations = 10000000;
final int count = (int) (0.7 * iterations);
final AtomicInteger finishedSingle = new AtomicInteger(iterations);
final AtomicInteger finishedZynced = new AtomicInteger(iterations);
final MovingAverage.Cumulative singleCum = new MovingAverage.Cumulative();
final MovingAverage.Cumulative zyncedCum = new MovingAverage.Cumulative();
final MovingAverage singleConv = new MovingAverage.Converging(0.5);
final MovingAverage zyncedConv = new MovingAverage.Converging(0.5);
// -----------------------------------------------------------
// -----------------------------------------------------------
public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < test.iterations; i++) {
test.benchmark(i);
}
Threads.sleep(1000000);
}
// -----------------------------------------------------------
// -----------------------------------------------------------
void benchmark(int i) {
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
a();
}
long elapsed = System.nanoTime() - start;
int s = this.finishedSingle.decrementAndGet();
if ( s <= count ) {
singleCum.add (elapsed);
singleConv.add(elapsed);
}
if ( s == 0 ) {
System.out.println(elapsed);
System.out.println("Single Cum:\t\t" + singleCum.val());
System.out.println("Single Conv:\t" + singleConv.val());
System.out.println();
}
long zstart = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
az();
}
long elapzed = System.nanoTime() - zstart;
int z = this.finishedZynced.decrementAndGet();
if ( z <= count ) {
zyncedCum.add(elapzed);
zyncedConv.add(elapzed);
}
if ( z == 0 ) {
// Just to avoid the output not overlapping with the one above
Threads.sleep(500);
System.out.println();
System.out.println("Zynced Cum: \t" + zyncedCum.val());
System.out.println("Zynced Conv:\t" + zyncedConv.val());
System.out.println();
}
}
synchronized void a() { b(); }
void b() { c(); }
void c() { d(); }
void d() { e(); }
void e() { f(); }
void f() { g(); }
void g() { h(); }
void h() { i(); }
void i() { }
synchronized void az() { bz(); }
synchronized void bz() { cz(); }
synchronized void cz() { dz(); }
synchronized void dz() { ez(); }
synchronized void ez() { fz(); }
synchronized void fz() { gz(); }
synchronized void gz() { hz(); }
synchronized void hz() { iz(); }
synchronized void iz() {}
}
结论,真的没有区别。
不会有任何区别。由于线程内容仅在 x() 处获取锁。在 x() 处获得锁的线程可以在 y() 处获得锁而没有任何争用(因为只有在某个特定时间可以到达该点的线程)。所以把 synchronized 放在那里没有效果。
在两种方法同步的情况下,您将锁定监视器两次。因此,第一种方法会再次产生额外的锁定开销。但是您的 JVM 可以通过锁粗化来降低锁定成本,并且可以内联调用 y()。