我有一个应用程序当前依赖 infinispan 复制缓存在所有节点之间共享一个工作队列。队列非常标准,头、尾和大小指针都保留在 infinispan 映射中。
我们已经从 Infinispan 7.2.5 升级到 9.4.16 并且注意到锁定性能比以前差了很多。当他们都试图同时初始化队列时,我已经设法从 2 个节点获取线程转储。Infinispan 7.2.5 的锁定和同步性能非常好,没有任何问题。现在我们看到了锁超时和更多的失败。
来自线程转储 2021-04-20 13:45:13 的节点 #1 部分堆栈跟踪:
"default task-2" #600 prio=5 os_prio=0 tid=0x000000000c559000 nid=0x1f8a waiting on condition [0x00007f4df3f72000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006e1f4fec0> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:105)
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:38)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1077)
at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1057)
at org.infinispan.cache.impl.AbstractDelegatingAdvancedCache.lock(AbstractDelegatingAdvancedCache.java:286)
at org.infinispan.cache.impl.EncoderCache.lock(EncoderCache.java:318)
at com.siperian.mrm.match.InfinispanQueue.initialize(InfinispanQueue.java:88)
来自线程转储的 Node#2 部分堆栈跟踪:2021-04-20 13:45:04:
"default task-2" #684 prio=5 os_prio=0 tid=0x0000000011f26000 nid=0x3c60 waiting on condition [0x00007f55107e4000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000746bd36d8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:105)
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:38)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1077)
at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1057)
at org.infinispan.cache.impl.AbstractDelegatingAdvancedCache.lock(AbstractDelegatingAdvancedCache.java:286)
at org.infinispan.cache.impl.EncoderCache.lock(EncoderCache.java:318)
at com.siperian.mrm.match.InfinispanQueue.initialize(InfinispanQueue.java:88)
在运行节点 #1 的机器的控制台上弹出客户端错误:
2021-04-20 13:45:49,069 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (jgroups-15,infinispan-cleanse-cluster_192.168.0.24_cmx_system105,N1618938080334-63633(machine-id=M1618938080334)) ISPN000136: Error executing command LockControlCommand on Cache 'orclmdm-MDM_SAMPLE105/FUZZY_MATCH', writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 60 seconds for key QUEUE_TAIL_C_PARTY and requestor GlobalTx:N1618938080334-63633(machine-id=M1618938080334):429. Lock is held by GlobalTx:N1618938062946-60114(machine-id=M1618938062946):420
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288)
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261)
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$CompositeLockPromise.lock(DefaultLockManager.java:348)
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:208)
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46)
at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:118)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:81)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:30)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:67)
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102)
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1369)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1272)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:126)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1417)
at org.jgroups.JChannel.up(JChannel.java:816)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:900)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:128)
at org.jgroups.protocols.RSVP.up(RSVP.java:163)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:177)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:872)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1008)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:734)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:389)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:590)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:171)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:131)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:203)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:253)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:280)
at org.jgroups.protocols.Discovery.up(Discovery.java:295)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1250)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Infinispan 配置:
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:9.4 http://www.infinispan.org/schemas/infinispan-config-9.4.xsd"
xmlns="urn:infinispan:config:9.4">
<jgroups>
<stack-file name="mdmudp" path="$cmx.home$/jgroups-udp.xml" />
<stack-file name="mdmtcp" path="$cmx.home$/jgroups-tcp.xml" />
</jgroups>
<cache-container name="MDMCacheManager" statistics="true"
shutdown-hook="DEFAULT">
<transport stack="mdmudp" cluster="infinispan-cluster"
node-name="$node$" machine="$machine$" />
<jmx domain="org.infinispan.mdm.hub"/>
<replicated-cache name="FUZZY_MATCH" statistics="true" unreliable-return-values="false">
<locking isolation="READ_COMMITTED" acquire-timeout="60000"
concurrency-level="5000" striping="false" />
<transaction
transaction-manager-lookup="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
stop-timeout="30000" auto-commit="true" locking="PESSIMISTIC"
mode="NON_XA" notifications="true" />
</replicated-cache>
</cache-container>
</infinispan>
我们默认使用 udp 多播,这里是 udp 配置:
<!--
Default stack using IP multicasting. It is similar to the "udp"
stack in stacks.xml, but doesn't use streaming state transfer and flushing
author: Bela Ban
-->
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<UDP
mcast_port="${jgroups.udp.mcast_port:46688}"
ip_ttl="4"
tos="8"
ucast_recv_buf_size="5M"
ucast_send_buf_size="5M"
mcast_recv_buf_size="5M"
mcast_send_buf_size="5M"
max_bundle_size="64K"
enable_diagnostics="true"
thread_naming_pattern="cl"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"/>
<PING />
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
<!-- pbcast.FLUSH /-->
</config>
任何关于配置的想法都会很棒。发生的情况是两个节点都超时并且队列没有正确初始化(空键)。提前致谢。顺便说一句,每个节点上最多有 24 个线程(总共 48 个)可以访问共享队列。