我是 Java 缓存的新手,我尝试了解按值存储与按引用存储之间的区别。
我在下面引用了 java.cache 文档中的段落“将条目存储在缓存中并再次从缓存返回时复制条目的目的是允许应用程序继续改变键和值的状态而不会导致侧面-对缓存持有的条目的影响。”
上面提到的“副作用”是什么?以及我们在实践中如何选择存储方式?
The question is great, since the answer isn't an easy one. The real semantics vary slightly across cache implementations.
store by reference:
The cache stores and returns the identical object references.
Object key = ...
Object value = ...
cache.put(key, value);
assert cache.get(key) == value;
assert cache.iterator().next().getKey() == key;
If you mutate the key after storing the value, you have an ambiguous situation. The effect is the same when using a HashMap
or ConcurrentHashMap
.
Use store by reference, to:
store by value:
Also it seems obvious, things are not so clear what store by value really means. According to the Spec leads of JCache: Brian Oliver said it's protection against cache data corruption, Greg Luck said it's everything but not store by reference.
For that matter I did analyze different compliant (means passing the TCK) JCache implementations. Key and value objects are copied when passed to the cache, but you cannot rely on the fact that an object in the cache is copied when returned to the application.
So this assumption isn't true for all JCache implementations:
assert cache.get(key) != cache.get(key);
JCache implementations may even vary more, when it gets into detail. An example:
Map map = cache.getAll(...);
assert map.get(key) != map.get(key);
Here is a contradiction in the expected semantics. We would expect that the map contents are stable, OTOH the cache would need to return a copy of the value on every access. The JCache spec doesn't enforce concrete semantics for this. The devil is in the details.
Since the key is copied upon storage by every cache implementation you will get additional safety that the cache internal data structures are sane, but applications still have the chance to break because of shared value references.
My personal conclusion (I am open for discussion):
Since store by reference is an optional JCache feature, requesting it, would mean you limit the number of cache implementations your application works with. Use store by value always, if you don't rely on store by reference semantics.
However, don't make your application depend on the semantics you think you might get with store by value. Never mutate any object after handing its reference to the cache or after retrieving its reference from the cache.
If there is still doubt, ask your cache vendor. IMHO its good practice to document implementation details. A good example (since I spent much thought in it...) is the JCache chapter in the cache2k user guide
是为了防止可变对象的并发修改。副作用是其他线程正在使用该对象做某事。
一个例子是,如果您有一个具有多个线程的银行程序,其中包含表示它们之间共享的银行帐号的整数对象缓存。假设线程一从缓存中检索一个数字,然后开始对其执行操作。当线程 1 被操作时,对象线程 2 检索相同的对象,并开始操作它。由于他们以不协调的方式同时操作同一个对象,因此结果是不可预测的。对象本身甚至会损坏。
按值存储消除了并发编程中的这个常见问题,如果它只是在将对象保存到缓存时存储对象的副本,并在从缓存中检索对象时分发对象的副本。