6

Online DDL with Group Replication In Percona Server for MySQL 8.0.22 (and MySQL...

 3 years ago
source link: https://www.percona.com/blog/2021/04/15/online-ddl-with-group-replication-in-percona-server-for-mysql-8-0-22/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Online DDL with Group Replication In Percona Server for MySQL 8.0.22 (and MySQL 8.0.23)

Online DDL with Group Replication MySQLWhile I was working on my grFailOver POC, I have also done some additional parallel testing. One of them was to see how online DDL is executed inside a Group Replication cluster.

The online DDL feature provides support for instant and in-place table alterations and concurrent DML. Checking the Group Replication (GR) official documentation, I was trying to identify if any limitation exists, but the only thing I have found was this:

“Concurrent DDL versus DML Operations.  Concurrent data definition statements and data manipulation statements executing against the same object but on different servers is not supported when using multi-primary mode. During execution of Data Definition Language (DDL) statements on an object, executing concurrent Data Manipulation Language (DML) on the same object but on a different server instance has the risk of conflicting DDL executing on different instances not being detected.”

This impacts only when you have a multi-primary scenario, which is NOT recommended and not my case. So, in theory, GR should be able to handle the online DDL without problems. 

My scenario:

I have two DCs and I am going to do actions on my DC1 and see how it propagates all over, and what impact it will have.

The Test

To do the test, I will run and insert from select. 

MySQL
insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;

And a select, on my Primary node gr1, while on another connection execute the ALTER:

MySQL
ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;

As you may have noticed, I am EXPLICITLY asking for INPLACE and lock NONE. So in this case, MySQL cannot satisfy;  it should exit and not execute the command.

In the meantime, on all other nodes, I will run a check command to see WHEN my ALTER is taking place. Let us roll the ball:

On my Primary, the command to insert the data:

Shell
[root@gr1 grtest]# while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Again on Primary another session to execute the ALTER:

MySQL
DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;

On other nodes to monitor when ALTER will start:

Shell
while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "show processlist;"|grep -i "alter";sleep 1;done

What Happens

Data is inserted by the loop.

ALTER starts, but I can still insert data in my table, and most importantly, the data is propagated to all nodes of the DC1 cluster.

No ALTER action on the other nodes.

Shell
.458  <---- end of the alter locally

Once ALTER is complete on the local node (Primary) it is then executed (broadcast) to all the nodes participating in the cluster.

MySQL
[ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction <--- waiting for waiting for handler commit    No INSERTS are allowed

But writes are suspended, waiting for:

MySQL
37411 | root            | localhost          | windmills_s | Query            |    19 | Waiting for table metadata lock                                 | insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype

And eventually, it will timeout.

The other point is that any write hangs until the slowest node had applied the ALTER. It is important to note that all nodes, not only the PRIMARY, remain pending waiting for the slow node: the slowest drives all.

GR3:

Shell
11:01:28.649  48 system user windmills_s Query 171 altering table ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE
11:01:29.674  48 system user windmills_s Query 172 waiting for handler commit ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE

GR2:

Shell
Start 11:00:14.438  18 system user windmills_s Query 97 altering table ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE
Ends 11:02:00.107  18 system user windmills_s Query 203 altering table ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE

Finally, when the last node in the GR cluster has applied the ALTER, the writes will resume, and the Replica node on DC2 will start its ALTER operation on PRIMARY first, then on the other nodes.

Summarizing:

  1. Writes are executed on Primary
  2. ALTER is executed on the Primary
    • DDL does not impact the write operation and respects the not blocking directive.
  3. ALTER is completed on Primary and passed to all nodes
    • Meta lock is raised on nodes
  4. ALL cluster waits for the slowest node to complete
  5. When all is done in the DC1 then the action is replicated to DC2
    • Goto point 2

Conclusion

It seems that at the moment we have partial coverage of the online ddl feature when using group_replication. Of course, to have to wait for the SECONDARY nodes is better and less impacting than to wait for PRIMARY first and then the SECONDARIES.

But it is confusing, given I was expecting to have either full online coverage (I had explicitly asked for that in the DDL command) or a message telling me it cannot be executed online.  Of course, I would prefer to have FULL online coverage. ;0) 

Keep in mind my setup was also pretty standard and that changing group_replication_consistency does not affect the outcome. But not sure I can classify this as a bug, more an unexpected undesirable behavior.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK