我目前正在使用具有 3 个节点的 Galera Cluster 在读/写拆分模式下测试 Maxscale。默认情况下,Maxscale 将一个节点定义为主节点,将另一个节点定义为从节点(我的配置是 100% 的从节点)。
我的目的是检查 Maxscale 如何处理节点关闭。
问题是使用基准测试(Sysbench、Mysqlslap)和自定义脚本(PHP),当我关闭集群的一个节点时,与后端(MariaDB)的连接会丢失。
错误日志:
MariaDB Corporation MaxScale /var/log/maxscale/error1.log Thu Oct 29 13:00:11 2015
-----------------------------------------------------------------------
--- Logging is enabled.
2015-10-29 13:00:11 Error: Failed to obtain address for host ::1, Address family for hostname not supported
2015-10-29 13:00:11 Warning: Failed to add user root@::1 for service [RW Split Router]. This user will be unavailable via MaxScale.
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: cmon@127.0.0.1 for database: (null)
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: root@127.0.0.1 for database: (null)
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: root@10.58.224.113 for database: (null)
2015-10-29 13:00:35 Error : Unable to write to backend due to authentication failure.
2015-10-29 13:00:40 Error : Monitor was unable to connect to server 10.58.224.113:3306 : "Can't connect to MySQL server on '10.58.224.113' (111)"
跟踪日志:
2015-10-29 13:00:33 [4] Route query to slave 10.58.224.113:3306 <
2015-10-29 13:00:33 [4] Servers and router connection counts:
2015-10-29 13:00:33 [4] current operations : 0 in 10.58.224.113:3306 RUNNING SLAVE
2015-10-29 13:00:33 [4] current operations : 0 in 10.26.116.84:3306 RUNNING SLAVE
2015-10-29 13:00:33 [4] current operations : 0 in 10.26.84.103:3306 RUNNING MASTER
2015-10-29 13:00:33 [4] Selected RUNNING SLAVE in 10.58.224.113:3306
2015-10-29 13:00:33 [4] Selected RUNNING SLAVE in 10.26.116.84:3306
2015-10-29 13:00:33 [4] Selected RUNNING MASTER in 10.26.84.103:3306
2015-10-29 13:00:34 [4] > Autocommit: [enabled], trx is [not open], cmd: COM_QUERY, type: QUERY_TYPE_READ, stmt: SELECT COUNT(*) FROM sbtest1
2015-10-29 13:00:34 [4] Route query to slave 10.58.224.113:3306 <
2015-10-29 13:00:36 [4] Stopped RW Split Router client session [4]
2015-10-29 13:00:42 Server changed state: server1[10.58.224.113:3306]: slave_down
PHP测试脚本
<?php
# Test MaxScale
$db = new PDO('mysql:host=127.0.0.1;dbname=sbtest;charset=utf8;port=4446;', 'root', '***', array(PDO::ATTR_TIMEOUT => "10", PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
for($i=0; $i<5000; $i++)
{
try{
$q = $db->query('SELECT COUNT(*) FROM sbtest1', PDO::FETCH_NUM);
if($q){
$res = $q->fetchAll();
#var_dump($res);
echo time()." Result: {$res[0][0]}\n";
sleep(1);
}
}
catch(PDOException $Exception) {
echo "PDOException: " . $Exception->getMessage() . "\n";
die('forced script to stop');
}
}
Mysqlslap 基准测试:
mysqlslap -h127.0.0.1 -uroot -p*** -P4446 --create="CREATE TABLE a (b int);INSERT INTO a VALUES (23)" --query="SELECT * FROM a" --concurrency=50 --iterations=200 --delimiter=";"
Sysbench 基准测试:
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-table-size=2500 --mysql-user=root --mysql-password=*** --mysql-host=127.0.0.1 --db-ps-mode=disable --mysql-port=4446 prepare
sysbench --num-threads=16 --max-requests=5000 --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-skip-trx=on --oltp-read-only=on --oltp-table-size=250000 --mysql-host=127.0.0.1 --mysql-user=root --mysql-password=*** --mysql-port=4446 run
遇到的错误:
PDOException: SQLSTATE[HY000]: General error: 2003 Authentication with backend failed. Session will be closed.
PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
PDOException: SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query
最大规模配置:
[maxscale]
threads=4
auth_connect_timeout=20
auth_read_timeout=20
auth_write_timeout=20
log_trace=1
[Galera Monitor]
type=monitor
module=galeramon
servers=server1,server2,server3
user=maxmon
passwd=***
monitor_interval=30000
backend_connect_timeout=10
backend_read_timeout=10
backend_write_timeout=10
[RW Split Router]
type=service
router=readwritesplit
servers=server2,server3,server1
user=root
passwd=***
max_slave_connections=100%
enable_root_user=1
router_options=slave_selection_criteria=LEAST_CURRENT_OPERATIONS
[Debug Interface]
type=service
router=debugcli
[CLI]
type=service
router=cli[Debug Interface]
type=service
router=debugcli
[CLI]
type=service
router=cli
[RW Split Listener]
type=listener
service=RW Split Router
protocol=MySQLClient
port=4446
[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
address=127.0.0.1
port=4442
[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
port=6603
[server1]
type=server
address=10.58.224.113
port=3306
protocol=MySQLBackend
[server2]
type=server
address=10.26.84.103
port=3306
protocol=MySQLBackend
[server3]
type=server
address=10.26.116.84
port=3306
protocol=MySQLBackend
会话监控显示会话变得无效,如下例所示:
# maxadmin -pmariadb show sessions
Session 9 (0x7f60a4000b50)
State: Invalid State
Service: RW Split Router (0x342f460)
Client DCB: 0x7f60a40009a0
Client Address: root@127.0.0.1
Connected: Thu Oct 29 13:28:57 2015
我还在 Maxscale 以及我的 PHP 测试脚本(PDO 超时)中使用了不同的超时变量和 monitor_interval,但问题似乎是 Maxscale 如何处理 MySQL 会话。
我还读到了 Maxscale 的乐观方式,它转发了从其中一个节点获得的最快响应,但不确定这是否是原因。
有没有办法使节点关闭对 Maxscale 传播到集群的所有从节点的任何 SQL 请求无害?