1

使用 5.7.25 运行 InnoDB 集群(计划很快迁移到 8.0) 我的两个实例由于网络问题离开了集群,剩下一个健康的节点。

我正在执行以下过程以将节点添加到集群,但失败并显示如下错误。

我究竟做错了什么 ?

注意:host1 是留在集群中的健康节点。host2 是加入的人

host1 上的程序:

  1. super_read_only = ON
  2. 使用以下命令复制最后的 GTID:select @@global.gtid_executed;
  3. 设置super_read_only = OFF(就在主机 2 上的第 3 步之前)

host2 上的程序:

  1. 停止mysql
  2. rsync 来自 host1 的 mysql 数据目录,使用: rsync -Parvz --exclude="auto.cnf" --exclude="<host1>*" --exclude="binlog.*" <user>@<host1>:/mysql-data/* .
  3. 启动mysql
  4. 清除复制日志并设置 GTID 使用:
reset master;
reset slave;
set SQL_LOG_BIN=0; 
set @@GLOBAL.GTID_PURGED='<gtid from step 2 on host1>`;
set SQL_LOG_BIN=1; 
  1. 连接到 MySQL Shell 并将新节点 (host2) 添加到集群: cluster.addInstance('root@host2:3306', {ipWhitelist: 'host1, host2'})

来自无法加入的新实例(host2)的日志:

2020-03-09T15:19:33.328996Z 38 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind
=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-03-09T15:19:33.514003Z 38 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-03-09T15:19:33.514154Z 38 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-03-09T15:19:33.514181Z 38 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-03-09T15:19:33.514193Z 38 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "<uuid1>"; group_replication_local_address: "host2:33061"; group_replication_group_seeds: "host1:33061"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 100; group_replication_compression_threshold: 1000; group_replication_ip_whitelist: "host1ip, host2ip"'
2020-03-09T15:19:33.514223Z 38 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-03-09T15:19:33.514227Z 38 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-03-09T15:19:33.514239Z 38 [Note] Plugin group_replication reported: 'Member configuration: member_id: 139923628; member_uuid: "<uuid2>"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-03-09T15:19:33.514576Z 40 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-03-09T15:19:33.613296Z 43 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './scynbm96-relay-bin-group_replication_applier.000001' position: 4
2020-03-09T15:19:33.613383Z 38 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-03-09T15:19:33.613811Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-03-09T15:19:33.613858Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33061'
2020-03-09T15:19:33.667118Z 0 [Warning] Plugin group_replication reported: 'read failed'
2020-03-09T15:19:33.685025Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'
2020-03-09T15:19:34.732938Z 48 [Note] Got an error reading communication packets
2020-03-09T15:20:04.733653Z 52 [Note] Got an error reading communication packets
2020-03-09T15:20:33.613595Z 38 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2020-03-09T15:20:33.613655Z 38 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2020-03-09T15:20:33.613697Z 38 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2020-03-09T15:20:33.614136Z 43 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2020-03-09T15:20:33.614325Z 43 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 0
2020-03-09T15:20:33.614966Z 40 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2020-03-09T15:20:34.734155Z 55 [Note] Got an error reading communication packets
4

1 回答 1

0

以下步骤终于让我形成了一个健康的 3 节点集群。

  1. 将健康节点设置为 super_read_only
  2. 稍等片刻,让现有事务完成
  3. 使用复制 GTIDselect @@global.gtid_executed;
  4. 在host2和host3上,从头安装mysql
  5. 在host2和host3上,停止mysql服务器
  6. 使用以下命令将数据同步到两台主机:rsync -Parvz --exclude="auto.cnf" --exclude="<host1>*" --exclude="binlog.*" <user>@<host1>:/mysql-data/* .
  7. 验证主机 1 上的 GTID 是否未更改
  8. 在 host2 和 host3 上启动 mysql,通过在某些表上选择来验证数据是否完整
  9. 使用mysql shell,解散集群
  10. 再次创建集群,从其存在开始添加 host2 和 host3。

注意:集群解散后,您需要重新启动所有 MySQL 路由器 注意2:这里有一些监控信息: https ://dev.mysql.com/doc/refman/5.7/en/group-replication-monitoring.html ( 8.x 版增加了进一步的日志记录和检测)

于 2020-03-11T14:46:20.217 回答