mysql - 为什么在 MySQL 复制中使用 GTID？

Question

说到数据库复制，全局事务标识符有什么用？为什么我们需要防止跨服务器的并发？这种预防究竟是如何实现的？

我试图阅读 http://dev.mysql.com/doc/refman/5.7/en/replication-gtids.html上的文档，但仍然无法清楚地理解它。这听起来可能很基本，但如果有人能向我解释这些概念，我将不胜感激。

score 6 · Accepted Answer

The reason for the Global Transaction ID is to allow a MySQL slave to know if it has applied a given transaction or not, to keep things in sync between Master and Slave. It can also be used for restarting a slave if a connection goes down, again to know the point in time. Without using GTIDs, replication must be controlled based on the position in a given binary transaction log file (bin log). This is much harder to manage than the GTID method.

A master is the only server that is typically written to, so that slaves merely rebuild a copy of the master by applying each transaction in sequence.

It is also important to understand that MySQL replication can run in one of 3 modes:

Statement-based: Each SQL statement is logged to the binlog and replicated as a statement to the slave. This can be in some cases ambiguous at the slave causing the data to not match exactly. (Most of the time it is fine for common uses).
Row-based: In this mode MySQL replicates the actual data changes to each table, with a "before" and "after" picture of each row, which is fully accurate. This can result in a much larger binlog, for example if you have a bulk update query, like: UPDATE t1 SET c1 = 'a' WHERE c2 = 'b'.
Mixed: In this mode, MySQL will use a mix of statement-based and row-based logging in the binlog.

I only mention the modes of replication, because it is mentioned in the doc you referenced that Row-based is the recommended option if you are using GTIDs.

There is another option called Master-Master replication, where you can write to two masters (each acting as a slave for the other), but this requires a special configuration to ensure that the data written to each master is unique. It is much trickier to manage than a typical Master/Slave setup.

Therefore, the prevention of writes to a Slave is something that you must ensure from your application for a typical replication process to function correctly. It is fine to read from a Slave, but you should not write to it. Note that the Slave can be behind the Master if you are using it for reads, so it is best to perform queries for things that can be behind the Master (like reports that are not critical up to the second or millisecond). You can ensure no writes to the Slave by making your common application user a read-only user for the Slave server, and a read-write user for the Master.

score 4 · Accepted Answer

为什么我们需要防止跨服务器的并发？

如果我正确理解了这个问题，那么您说的是一致性。如果是这样，答案是您需要在分布式系统中保持一致的状态。例如，如果我的银行帐户信息在多个不同的服务器上复制，那么它们的余额必须完全相同。现在想象一下，我执行了多次货币交易（存款/支出），并且每次都连接到不同的服务器：并发问题会导致我的帐户余额在每台服务器上都不同，这是不可接受的。

这种预防究竟是如何实现的？

使用主/从方法。在这些服务器中，您有一个服务器（主服务器）负责处理每个写入操作，这意味着对数据库的修改必须仅由该服务器处理。该主服务器的数据库被复制到所有其他服务器（从属服务器），这些服务器不允许修改数据库，但可以用于读取数据库（例如 SELECT 操作）。知道只有一台服务器允许修改数据库，您就不会有一致性问题。

全局事务标识符有什么用？

服务器之间的通信是异步的，从服务器不需要一直与主服务器连接。因此，一旦从服务器重新连接到主服务器，它可能会发现主服务器的数据库同时被修改，因此它必须更新自己的数据库。现在的问题是知道主服务器执行的所有修改，哪些是从服务器在之前的日期已经执行的修改，哪些是尚未执行的修改。

GTID 解决了这个问题：它们唯一标识主服务器执行的每个事务。现在，从服务器可以识别主服务器执行的所有事务，这些事务是以前未见过的。

mysql - 为什么在 MySQL 复制中使用 GTID？

2 回答 2

Related

Reference