1

我确信这个答案就在某个地方,但经过几次尝试后我无法找到或修复它。这是用例:

1.> 我有两个 ec2 实例属于同一个 VPC 但具有不同的安全组

2.> 两个安全组都有 22,80(公共)和来自所有为 CIDR 块 10.20.0.0/16 开放的端口的所有流量

3.> EC2实例的内部IP为10.20.0.51(server-1)和10.20.0.202(server-2)

4.> 我正在使用以下命令在它们上运行两个 dockerized consul 服务器

server-1 : docker run -it -p 8400:8400 -p 8500:8500 -p 8600:53/udp -p 8301:8301 -p 8300:8300 -h node1 progrium/consul -server -advertise 10.20.0.51  -bootstrap-expect 2

server-2 : docker run -it -p 8400:8400 -p 8500:8500 -p 8600:53/udp -p 8301:8301 -p 8300:8300 --name node2 -h node2 progrium/consul -server -advertise 10.20.0.202 -join 10.20.0.51

5.> 他们都开始了,一秒钟后他们互相认识,选举发生,第一个节点被选举,但不久之后 server-2 开始说“成员列表:怀疑节点 1 失败,没有收到确认”和 server-1还说“成员列表:怀疑节点 2 失败,未收到任何确认”

这是 server-1 的日志的样子

2016/01/04 19:18:35 [INFO] serf: EventMemberJoin: node2 10.20.0.202
    2016/01/04 19:18:35 [INFO] consul: adding server node2 (Addr: 10.20.0.202:8300) (DC: dc1)
    2016/01/04 19:18:35 [INFO] consul: Attempting bootstrap with nodes: [10.20.0.51:8300 10.20.0.202:8300]
    2016/01/04 19:18:35 [WARN] raft: Heartbeat timeout reached, starting election
    2016/01/04 19:18:35 [INFO] raft: Node at 10.20.0.51:8300 [Candidate] entering Candidate state
    2016/01/04 19:18:35 [WARN] raft: Remote peer 10.20.0.202:8300 does not have local node 10.20.0.51:8300 as a peer
    2016/01/04 19:18:35 [INFO] raft: Election won. Tally: 2
    2016/01/04 19:18:35 [INFO] raft: Node at 10.20.0.51:8300 [Leader] entering Leader state
    2016/01/04 19:18:35 [INFO] consul: cluster leadership acquired
    2016/01/04 19:18:35 [INFO] consul: New leader elected: node1
    2016/01/04 19:18:35 [INFO] raft: pipelining replication to peer 10.20.0.202:8300
    2016/01/04 19:18:35 [INFO] consul: member 'node1' joined, marking health alive
    2016/01/04 19:18:35 [INFO] consul: member 'node2' joined, marking health alive
    2016/01/04 19:18:37 [INFO] memberlist: Suspect node2 has failed, no acks received
    2016/01/04 19:18:37 [INFO] agent: Synced service 'consul'
    2016/01/04 19:18:39 [INFO] memberlist: Suspect node2 has failed, no acks received
    2016/01/04 19:18:41 [INFO] memberlist: Suspect node2 has failed, no acks received
    2016/01/04 19:18:42 [INFO] memberlist: Marking node2 as failed, suspect timeout reached
    2016/01/04 19:18:42 [INFO] serf: EventMemberFailed: node2 10.20.0.202
    2016/01/04 19:18:42 [INFO] consul: removing server node2 (Addr: 10.20.0.202:8300) (DC: dc1)

对于服务器 -2

2016/01/04 19:18:10 [INFO] serf: EventMemberJoin: node2 10.20.0.202
    2016/01/04 19:18:10 [INFO] serf: EventMemberJoin: node2.dc1 10.20.0.202
    2016/01/04 19:18:10 [INFO] raft: Node at 10.20.0.202:8300 [Follower] entering Follower state
    2016/01/04 19:18:10 [INFO] agent: (LAN) joining: [10.20.0.51]
    2016/01/04 19:18:10 [INFO] consul: adding server node2 (Addr: 10.20.0.202:8300) (DC: dc1)
    2016/01/04 19:18:10 [INFO] consul: adding server node2.dc1 (Addr: 10.20.0.202:8300) (DC: dc1)
    2016/01/04 19:18:10 [INFO] serf: EventMemberJoin: node1 10.20.0.51
    2016/01/04 19:18:10 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2016/01/04 19:18:10 [ERR] agent: failed to sync remote state: No cluster leader
    2016/01/04 19:18:10 [INFO] consul: adding server node1 (Addr: 10.20.0.51:8300) (DC: dc1)
    2016/01/04 19:18:12 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:14 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:16 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:17 [INFO] memberlist: Marking node1 as failed, suspect timeout reached
    2016/01/04 19:18:17 [INFO] serf: EventMemberFailed: node1 10.20.0.51
    2016/01/04 19:18:17 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:17 [INFO] consul: removing server node1 (Addr: 10.20.0.51:8300) (DC: dc1)
    2016/01/04 19:18:19 [INFO] serf: EventMemberJoin: node1 10.20.0.51
    2016/01/04 19:18:19 [INFO] consul: adding server node1 (Addr: 10.20.0.51:8300) (DC: dc1)
    2016/01/04 19:18:19 [INFO] consul: New leader elected: node1
    2016/01/04 19:18:21 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:22 [INFO] agent: Synced service 'consul'
    2016/01/04 19:18:23 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:25 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:26 [INFO] memberlist: Marking node1 as failed, suspect timeout reached
    2016/01/04 19:18:26 [INFO] serf: EventMemberFailed: node1 10.20.0.51
    2016/01/04 19:18:26 [INFO] consul: removing server node1 (Addr: 10.20.0.51:8300) (DC: dc1)
    2016/01/04 19:18:26 [INFO] memberlist: Suspect node1 has failed, no acks received
    2016/01/04 19:18:40 [INFO] serf: attempting reconnect to node1 10.20.0.51:8301
    2016/01/04 19:18:40 [INFO] serf: EventMemberJoin: node1 10.20.0.51

我到底做错了什么。我只想在两个 EC2 实例中运行两个 consul docker 并在它们之间进行通信,而无需显式打开安全组中的端口(当我显式打开它们时,它当然可以工作!)

请有人帮忙。

谢谢

4

0 回答 0