Looking at the actual implementation, I mean looking at the code is way better than some microbenchmark ( which are less than useless in Java or any other GC runtime ), I am not surprised it isn't "significantly faster". It is basically doing an implicit synchronized section.
/**
* Atomically sets to the given value and returns the previous value.
*
* @param newValue the new value
* @return the previous value
*/
public final boolean getAndSet(boolean newValue) {
for (;;) {
boolean current = get();
if (compareAndSet(current, newValue))
return current;
}
}
/**
* Atomically sets the value to the given updated value
* if the current value {@code ==} the expected value.
*
* @param expect the expected value
* @param update the new value
* @return true if successful. False return indicates that
* the actual value was not equal to the expected value.
*/
public final boolean compareAndSet(boolean expect, boolean update) {
int e = expect ? 1 : 0;
int u = update ? 1 : 0;
return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}
And then this from com.sun.Unsafe.java
/**
* Atomically update Java variable to <tt>x</tt> if it is currently
* holding <tt>expected</tt>.
* @return <tt>true</tt> if successful
*/
public final native boolean compareAndSwapInt(Object o, long offset,
int expected,
int x);
there is no magic in this, resource contention is a bitch and very complex. That is why using final
variables and working with immutable data is so prevalent in real concurrent languages like Erlang. All this complexity that eats CPU time is by passed, or at least shifted somewhere less complex.