Attack No-PK Replication Lag with MySQL/Percona Server 8 Invisible Columns!
source link: https://www.percona.com/blog/attack-no-pk-replication-lag-with-mysql-percona-server-8-invisible-columns/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The most common issue when using row-based replication (RBR) is replication lag due to the lack of Primary keys.
The problem is that any replicated DML will do a full table scan for each modified row on the replica. This bug report explains it more in-depth: https://bugs.mysql.com/bug.php?id=53375
For example, if a delete is executed on the following table definition:
With this amount of rows:
The delete being:
It can be seen that the delete on the Primary requires a full table scan (Handler_read_rnd_next matches row amount + 1) to delete 88 rows.
The additional problem is that each of the rows being deleted will be recorded in the binary log individually like this:
Which will result in 88 full table scans on the replica, and hence the performance degradation.
For these cases, the recommendation is to add a primary key to the table, but sometimes adding a PK might not be easy because:
- There are no existing columns that could be considered a PK.
- Or adding a new column (as the PK) is not possible as it might impact queries from a 3rd party tool that we have no control over (or too complex to fix with query rewrite plugin).
The solution is to use MySQL/Percona Server for MySQL 8 and add an invisible column!
Adding a new column (named “newc”) invisible as a primary key can be done with the following line:
Note, adding a PK is an expensive operation that requires a table rebuild as shown here.
After adding an invisible PK, the table will look like this:
Deleting a row now will be recorded in the binary log like this:
Where @1 is the first column ( the PK in this case) which the replica can use to find the matching row without having to do a full table scan.
The operation executed on the replica would be similar to the following which requires only one scan to find the matching row:
Also as the name suggests, an invisible column won’t show nor it needs to be referenced when doing operations over the table, i.e:
But if needed, the new column (newc) can be fetched if explicitly queried:
What If…?
What if MySQL automatically detects that the PK is missing for InnoDB tables and adds the invisible PK?
Taking into account that an internal six bytes PK is already added when the PK is missing, it might be a good idea to allow the possibility of making the PK visible if you need to.
This means that when you execute this CREATE TABLE statement:
Will be automatically translated to:
And then we can execute this command:
To make it visible.
Conclusion
Missing primary keys is a problem for scaling databases, as replication will require a full table scan for each updated/delete row, and the more data the more lag.
Adding a PK might not be always possible because of 3rd party tools or restrictions, but adding an invisible primary key will do the trick and have the benefits of adding a PK without impacting syntax and operations from 3rd party clients/tools. What will be awesome is to make MySQL able to detect the missing PK, add it automatically, and change it to visible if you need to.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK