2

The Underlying Importance of the server_id Parameter

 8 months ago
source link: https://www.percona.com/blog/the-underlying-importance-of-the-server_id-parameter/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Underlying Importance of the server_id Parameter

January 8, 2024

Eduardo Krieg

One of the most underlooked parameters to configure MySQL is server_id, which is an integer number to identify a server inside a replication topology uniquely. Note that two servers within a replication set can’t have the same server_id value.

It is generally set up as a “random” number, just different from the one configured on the other replicas, and once it is set up, it is generally not reviewed or changed later, which is fine, but if overlooked, it could lead to an unwanted skip of transactions in a recovery scenario like the one I will describe below.

Suppose we have the following topology:

db2 - primary - server_id = 82
  _ db1 - replica - server_id = 81

Once the traffic is moved from the previous primary (db1), we can start the maintenance on it. During the maintenance, there was a disk issue, leaving the data in MySQL inconsistent.

Don’t worry, we have the backups taken from db2 earlier that same day, let’s use that!!

  • We copy the backup from db2 using rsync into db1.
  • We run the steps on how to restore a compressed backup from here.
  • We’re starting MySQL with the same configuration that db1 had before the crash. (server_id=81).
  • When setting up replication, we can use the replication positions from xtrabackup_binlog_info, as we need the coordinates from db2 at the time that the consistent backup was taken.

Between the backup was taken and the time of the restore, many binlogs were generated that needed to be applied, so we expect the replication to take some time to fully catch up with the primary.

Once the replication was configured, we noticed that the binlogs were advancing too fast.

Suppose we configured replication using the following positions:

mysql> CHANGE MASTER TO
    -> MASTER_HOST='10.10.10.10',
    -> MASTER_USER='repluser',
    -> MASTER_PASSWORD='XXXXXXXXXXXXXXXX',
    -> MASTER_LOG_FILE='mysql-bin.034322',
    -> MASTER_LOG_POS=375677230;
Query OK, 0 rows affected, 2 warnings (0.36 sec)

We review the replication status:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.10.10
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.034324
              Read_Master_Log_Pos: 3898727534
               Relay_Log_File: mysql-relay-bin.000006
              Relay_Log_Pos: 369
        Relay_Master_Log_File: mysql-bin.034324
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 3898727534
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2002
                  Master_UUID: 94318777-57fe-11ee-a884-b496916f3834
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400

Noticing something odd? Yes, Seconds_behind_master: 0, even when there are around 40+ binlogs to apply.

It kept running at a “fast pace”; it advanced 40 binlogs in a matter of a couple of minutes, until it failed:

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 10.10.10.10
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.034372
          Read_Master_Log_Pos: 107962899
               Relay_Log_File: mysql-relay-bin.000086
                Relay_Log_Pos: 6294
        Relay_Master_Log_File: mysql-bin.034364
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table test.mysql1; Can't find record in mysql1, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.034364, end_log_pos 123596930

Why? Was our backup broken?

Upon checking the binlogs, it’s found that the binlogs events that were processed “too fast” were generated by the server_id=81, which is the same as the current replica (as those records were generated on db1, which was the primary when the backup was taken). Once these records are tried to be applied on a server with the same server_id, those are skipped. 

Then, once the failover was done, the binlogs events are now being generated with the server_id=82 (db2). This is when it started to actually apply the events, but since it skipped multiple events from the same server_id, it missed many updates and inserts to the DB, leading it to an inconsistent state.

In this case, there’s no other option but to retry the restoration, this time changing the server_id value on the restore server before starting replication.

Another approach that could have been applied, other than changing the server ID, was to enable the option replicate-same-server-id.

As per the documentation:

“This option is for use on replicas. The default is 0 (FALSE). With this option set to 1 (TRUE), the replica does not skip events that have its own server ID. This setting is normally useful only in rare configurations.”

Conclusion

The scenario described in this blog may not apply to all restorations, but it is a good reminder to review the server_id when performing restores on a replication environment. It can avoid headaches!

It was also mentioned briefly the parameter “replicate-same-server-id” is not recommended to be used unless we’re sure what we’re doing.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Share This Post!

Subscribe

Connect with
guest

Label

0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK