mysql - InnoDB Cluster addInstance：插件 group_replication 报告：“读取失败”

Question

使用 5.7.25 运行 InnoDB 集群（计划很快迁移到 8.0）我的两个实例由于网络问题离开了集群，剩下一个健康的节点。

我正在执行以下过程以将节点添加到集群，但失败并显示如下错误。

我究竟做错了什么？

注意：host1 是留在集群中的健康节点。host2 是加入的人

host1 上的程序：

放super_read_only = ON
使用以下命令复制最后的 GTID：select @@global.gtid_executed;
设置super_read_only = OFF（就在主机 2 上的第 3 步之前）

host2 上的程序：

停止mysql
rsync 来自 host1 的 mysql 数据目录，使用： rsync -Parvz --exclude="auto.cnf" --exclude="<host1>*" --exclude="binlog.*" <user>@<host1>:/mysql-data/* .
启动mysql
清除复制日志并设置 GTID 使用：

reset master;
reset slave;
set SQL_LOG_BIN=0; 
set @@GLOBAL.GTID_PURGED='<gtid from step 2 on host1>`;
set SQL_LOG_BIN=1;

连接到 MySQL Shell 并将新节点 (host2) 添加到集群： cluster.addInstance('root@host2:3306', {ipWhitelist: 'host1, host2'})

来自无法加入的新实例（host2）的日志：

2020-03-09T15:19:33.328996Z 38 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind
=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-03-09T15:19:33.514003Z 38 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-03-09T15:19:33.514154Z 38 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-03-09T15:19:33.514181Z 38 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-03-09T15:19:33.514193Z 38 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "<uuid1>"; group_replication_local_address: "host2:33061"; group_replication_group_seeds: "host1:33061"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 100; group_replication_compression_threshold: 1000; group_replication_ip_whitelist: "host1ip, host2ip"'
2020-03-09T15:19:33.514223Z 38 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-03-09T15:19:33.514227Z 38 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-03-09T15:19:33.514239Z 38 [Note] Plugin group_replication reported: 'Member configuration: member_id: 139923628; member_uuid: "<uuid2>"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-03-09T15:19:33.514576Z 40 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-03-09T15:19:33.613296Z 43 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './scynbm96-relay-bin-group_replication_applier.000001' position: 4
2020-03-09T15:19:33.613383Z 38 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-03-09T15:19:33.613811Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-03-09T15:19:33.613858Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33061'
2020-03-09T15:19:33.667118Z 0 [Warning] Plugin group_replication reported: 'read failed'
2020-03-09T15:19:33.685025Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'
2020-03-09T15:19:34.732938Z 48 [Note] Got an error reading communication packets
2020-03-09T15:20:04.733653Z 52 [Note] Got an error reading communication packets
2020-03-09T15:20:33.613595Z 38 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2020-03-09T15:20:33.613655Z 38 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2020-03-09T15:20:33.613697Z 38 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2020-03-09T15:20:33.614136Z 43 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2020-03-09T15:20:33.614325Z 43 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 0
2020-03-09T15:20:33.614966Z 40 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2020-03-09T15:20:34.734155Z 55 [Note] Got an error reading communication packets

score 0 · Accepted Answer

以下步骤终于让我形成了一个健康的 3 节点集群。

将健康节点设置为 super_read_only
稍等片刻，让现有事务完成
使用复制 GTIDselect @@global.gtid_executed;
在host2和host3上，从头安装mysql
在host2和host3上，停止mysql服务器
使用以下命令将数据同步到两台主机：rsync -Parvz --exclude="auto.cnf" --exclude="<host1>*" --exclude="binlog.*" <user>@<host1>:/mysql-data/* .
验证主机 1 上的 GTID 是否未更改
在 host2 和 host3 上启动 mysql，通过在某些表上选择来验证数据是否完整
使用mysql shell，解散集群
再次创建集群，从其存在开始添加 host2 和 host3。

注意：集群解散后，您需要重新启动所有 MySQL 路由器注意2：这里有一些监控信息： https ://dev.mysql.com/doc/refman/5.7/en/group-replication-monitoring.html （ 8.x 版增加了进一步的日志记录和检测）

mysql - InnoDB Cluster addInstance：插件 group_replication 报告：“读取失败”

host1 上的程序：

host2 上的程序：

1 回答 1

Related

Reference