java - EC2 上的 JGroups 节点不说话，尽管他们看到了对方

Question

我正在尝试使用 Hibernate Search，以便从 jgroupsSlave 节点对 Lucene 索引的所有写入都发送到 jgroupsMaster 节点，然后将 Lucene 索引与 Infinispan 共享回从站。一切都在本地工作，但是当节点在 EC2 上发现彼此时，它们似乎没有进行通信。

他们都在互相发送你还活着的消息。

# master output sample
86522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87449 [Timer-4,luceneCluster,archlinux-37498] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-57950 (own address=archlinux-37498)
87522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

# slave output sample
85499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
85503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86190 [Timer-3,luceneCluster,archlinux-57950] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-37498 (own address=archlinux-57950)
86499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

安全组

我有两个 jar，一个用于 master，一个用于 slave，我在他们自己的 EC2 实例上运行。我可以从另一个实例 ping 每个实例，它们都在同一个安全组中，它为我组中任何机器之间的通信定义了以下规则。

ICMP 的所有端口 0-65535 用于 TCP 0-65535 用于 UDP

所以我认为这不是安全组配置问题。

hibernate.properties

# there is also a corresponding jgroupsSlave
hibernate.search.default.worker.backend=jgroupsMaster
hibernate.search.default.directory_provider = infinispan
hibernate.search.infinispan.configuration_resourcename=infinispan.xml
hibernate.search.default.data_cachename=localCache
hibernate.search.default.metadata_cachename=localCache

infinispan.xml

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
            xmlns="urn:infinispan:config:5.1">
    <global>
        <transport clusterName="luceneCluster" transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
            <properties>
                <property name="configurationFile" value="jgroups-ec2.xml" />
            </properties>
        </transport>
    </global>

    <default>
        <invocationBatching enabled="true" />
        <clustering mode="repl">

        </clustering>
    </default>

    <!-- this is just so that each machine doesn't have to store the index
         in memory -->
    <namedCache name="localCache">
        <loaders passivation="false" preload="true" shared="false">
            <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
                <properties>
                    <property name="location" value="/tmp/infinspan/master" />
                    <!-- there is a corresponding /tmp/infinispan/slave in
                    the slave config -->
                </properties>
            </loader>
        </loaders>
    </namedCache>
</infinispan>

jgroups-ec2.xml

<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.2.xsd">
    <TCP
            bind_addr="${jgroups.tcp.address:127.0.0.1}"
            bind_port="${jgroups.tcp.port:7800}"
            loopback="true"
            port_range="30"
            recv_buf_size="20000000"
            send_buf_size="640000"
            max_bundle_size="64000"
            max_bundle_timeout="30"
            enable_bundling="true"
            use_send_queues="true"
            sock_conn_timeout="300"
            enable_diagnostics="false"

            bundler_type="old"

            thread_pool.enabled="true"
            thread_pool.min_threads="2"
            thread_pool.max_threads="30"
            thread_pool.keep_alive_time="60000"
            thread_pool.queue_enabled="false"
            thread_pool.queue_max_size="100"
            thread_pool.rejection_policy="Discard"

            oob_thread_pool.enabled="true"
            oob_thread_pool.min_threads="2"
            oob_thread_pool.max_threads="30"
            oob_thread_pool.keep_alive_time="60000"
            oob_thread_pool.queue_enabled="false"
            oob_thread_pool.queue_max_size="100"
            oob_thread_pool.rejection_policy="Discard"
            />
    <S3_PING secret_access_key="removed_for_stackoverflow" access_key="removed_for_stackoverflow" location="jgroups_ping" />

    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2
            use_mcast_xmit="false"
            xmit_interval="1000"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"
            become_server_queue_size="0"/>
    <UNICAST2
            max_bytes="20M"
            xmit_table_num_rows="20"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"/>
    <RSVP />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
    <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>
    <UFC max_credits="2000000" min_threshold="0.10"/>
    <MFC max_credits="2000000" min_threshold="0.10"/>
    <FRAG2 frag_size="60000"/>
</config>

我直接从最新的 infinispan-core 发行版（5.2.0.Beta3，但我也尝试了我认为的 5.1.4）中复制了这个。我唯一改变的是将他们的 s3_ping 替换为我的，但我再次看到节点写入 s3，并且它们找到了彼此，所以我认为这不是问题。我也开始主/从，将 jgroups.tcp.address 的环境变量设置为他们的私有 IP 地址。我还尝试了一些大大简化的配置，但没有任何成功。

关于问题可能是什么的任何想法？我花了几天时间玩它，它让我发疯。我认为它必须与 jgroups 配置有关，因为它在本地工作并且无法在 EC2 上交谈。

你们还有其他信息想帮助解决这个问题吗？

score 6 · Accepted Answer

您有两个 JGroups 通道正在启动，因此要指定两个 JGroups 配置：一个用于 Infinispan，一个用于后端工作人员通信。

Infinispan 和jgroupsMaster都将使用它们的默认配置设置，除非您指定一个，但默认使用在 EC2 上不起作用的多播。

看来您为 Infinispan 索引设置了正确的配置，但您必须重新配置jgroupsMaster工作程序才能使用 S3_PING 或 JDBC_PING；它可能在本地为您工作，因为默认配置能够使用多播自动发现对等点。

这种重复将由HSEARCH-882解决，我期待它显着简化配置。

java - EC2 上的 JGroups 节点不说话，尽管他们看到了对方

1 回答 1

Related

Reference