mysql - MySQL 集群数据节点在其孪生失败时重新启动

Question

配置：

服务器#1：1 mgm 节点（#49），1 个数据节点（#1），1 个 sql 节点（真实 IP 192.168.1.128）
服务器#2：1 mgm 节点（#50），1 个数据节点（#2），1 个 sql 节点（真实 IP 192.168.1.130）
虚拟IP：192.168.1.240（使用keepalived，服务器#1作为master）

规格：

MySQL 集群 7.3.6 x86_64
Debian 7.6 x86_64

它是使用 MySQL Cluster Auto-Installer 部署的。每件事都很好。
但是，当我关闭一个节点时，另一台服务器上的数据节点会重新启动。NDB_MGM 显示它正在“开始”。退出“开始”状态需要很长时间。
正如我测试的那样，当有四个节点时它不会发生。
有谁知道这次重启的原因是什么？
提前致谢。

更新：配置文件和命令行参数
1. NDB_MGMD #50 的配置文件

#
# Configuration file for MyCluster NDB_MGMD #49
# /usr/local/mysql/data/49/config.ini

[NDB_MGMD DEFAULT]
Portnumber=1186

[NDB_MGMD]
NodeId=49
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/49/
Portnumber=1186

[NDB_MGMD]
NodeId=50
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/50/
Portnumber=1186

[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M

[NDBD DEFAULT]
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M
BackupReportFrequency=10
MemReportFrequency=30
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
DataMemory=1M
IndexMemory=1M
MaxNoOfTables=4096
MaxNoOfTriggers=3500
NoOfReplicas=2
StringMemory=25
DiskPageBufferMemory=64M
SharedGlobalMemory=20M
LongMessageBuffer=32M
MaxNoOfConcurrentTransactions=16384
BatchSizePerLocalScan=512
FragmentLogFileSize=64M
NoOfFragmentLogFiles=16
RedoBuffer=32M
MaxNoOfExecutionThreads=2
StopOnError=false
LockPagesInMainMemory=1
TimeBetweenEpochsTimeout=32000
TimeBetweenWatchdogCheckInitial=60000
TransactionInactiveTimeout=60000
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000

[NDBD]
NodeId=1
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/1/

[NDBD]
NodeId=2
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/2/

[MYSQLD DEFAULT]

[MYSQLD]
NodeId=53
HostName=192.168.1.128

[MYSQLD]
NodeId=54
HostName=192.168.1.130

2. NDB_MGMD #50 的配置文件

#
# Configuration file for MyCluster NDB_MGMD #50
# /usr/local/mysql/data/50/config.ini

[NDB_MGMD DEFAULT]
Portnumber=1186

[NDB_MGMD]
NodeId=49
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/49/
Portnumber=1186

[NDB_MGMD]
NodeId=50
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/50/
Portnumber=1186

[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M

[NDBD DEFAULT]
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M
BackupReportFrequency=10
MemReportFrequency=30
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
DataMemory=1M
IndexMemory=1M
MaxNoOfTables=4096
MaxNoOfTriggers=3500
NoOfReplicas=2
StringMemory=25
DiskPageBufferMemory=64M
SharedGlobalMemory=20M
LongMessageBuffer=32M
MaxNoOfConcurrentTransactions=16384
BatchSizePerLocalScan=512
FragmentLogFileSize=64M
NoOfFragmentLogFiles=16
RedoBuffer=32M
MaxNoOfExecutionThreads=2
StopOnError=false
LockPagesInMainMemory=1
TimeBetweenEpochsTimeout=32000
TimeBetweenWatchdogCheckInitial=60000
TransactionInactiveTimeout=60000
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000

[NDBD]
NodeId=1
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/1/

[NDBD]
NodeId=2
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/2/

[MYSQLD DEFAULT]

[MYSQLD]
NodeId=53
HostName=192.168.1.128

[MYSQLD]
NodeId=54
HostName=192.168.1.130

命令行参数：
1. 在服务器 #1 上启动 ndb_mgmd

/usr/local/mysql/bin/ndb_mgmd --initial --ndb-nodeid=49 \
--config-dir=/usr/local/mysql/data/49/ \
--config-file=/usr/local/mysql/data/49/config.ini

2. 在服务器 #2 上启动 ndb_mgmd

/usr/local/mysql/bin/ndb_mgmd --initial --ndb-nodeid=50 \
--config-dir=/usr/local/mysql/data/50/ \
--config-file=/usr/local/mysql/data/50/config.ini

3. 在服务器 #1 上启动 ndbmtd

/usr/local/mysql/bin/ndbmtd --ndb-nodeid=1 --bind-address=192.168.1.128 \
--ndb-connectstring=192.168.1.240:1186,

4. 在服务器 #2 上启动 ndbmtd

/usr/local/mysql/bin/ndbmtd --ndb-nodeid=2 --bind-address=192.168.1.130 \
--ndb-connectstring=192.168.1.240:1186,

score 0 · Accepted Answer

There is a problem when your two node settings. If you have a network problem (split brain condition), both nodes won't see each other, and then, they will decide to shutdown. Then, they will start but they will have to wait for the other node, unless "nowaitfornodes" is specified.

With 4 nodes, you're splitting your cluster 3/1, so the node that have network up will have enough quorum to validate it's mgm node as arbitrator, and will become master.

You should resolve this issue either placing the mgm node in a third machine (it's a really lightweigth process, so no special resources needed) or using a Cluster and binding mgm service to the VIP. If not, you will lost service on network failure of one of the nodes.

For the VIP config data nodes must be forced to use real IP:

--bind-address=name

And ArbitrationTimeout should be set high enough to allow cluster migrate mgm service

For the mgm node, disabling config cache will make configuration changes easier

--config-cache=FALSE

mysql - MySQL 集群数据节点在其孪生失败时重新启动

1 回答 1

Related

Reference