2

MySQL: Compression, Indexes, and Two Smoking Barrels

 2 years ago
source link: https://www.percona.com/blog/mysql-compression-indexes-and-two-smoking-barrels/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

MySQL: Compression, Indexes, and Two Smoking Barrels

MySQL CompressionHello folks, we all know the benefits of table compression in MySQL, with current technology, processing speed, and compression disks, and in most cases, it is a benefit.

Now, what happens when that doesn’t happen?

In this simple blog, I’m going to show you something interesting that happens when you try to transition from a compressed table to a normal one. We are not going to talk in detail or discuss anything else such as: why should it be bad that the table is compressed? What is the performance, system performance, not even what penalty of hardware resources the compression requires?  We are also not going to talk about whether the pages in memory are compressed, uncompressed, or both. This has been talked about enough and there are some very interesting blogs that you can find on our site or even in the documentation itself.

What I am going to show you is something very curious, and for this, we are going to carry out the exercise from the beginning to the end in such a way that you can do it yourself (well, except for the data loading part, heh!) and I’ll give an example script (later).

Let’s create the table in a database instance with Percona Server for MySQL 5.7 installed.

Shell
CentOS7-S1 PS_57> CREATE TABLE test_compress (
   id bigint(20) unsigned NOT NULL,
   identification_id int(10) unsigned DEFAULT NULL,
   timestamp datetime NOT NULL,
   action varchar(50) NOT NULL,
   result varchar(50) NOT NULL,
   PRIMARY KEY (id),
   KEY INDEX_test_compress_result (result),
   KEY INDEX_test_compress_timestamp (timestamp)
   ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.02 sec)

After that, let’s load the table with some data (100,000 rows).

Shell
CentOS7-S1 $ for NUM in {1..100000}; do mysql -h localhost PS_57 -e "insert into test_compress (id, identification_id, timestamp, action, result) values ($NUM,$NUM*100,now(),concat('string',$NUM),concat('VeryVeryLargeString',$NUM))"; done

Let’s verify the size of the table (previously doing an ANALYZE TABLE with innodb_stats_persistent_sample_pages=100000 so that the statistics are as realistic as possible).

Shell
CentOS7-S1 PS_57> set global innodb_stats_persistent_sample_pages=100000; Query OK, 0 rows affected (0.00 sec)
Shell
CentOS7-S1 PS_57> analyze table test_compress;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| PS_57.test_compress    | analyze | status   | OK       |
+------------------------+---------+----------+----------+
Query OK, 0 rows affected (0.00 sec)
Shell
CentOS7-S1 PS_57> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------+
| table_schema | table_name    | table_rows | TOTAL_MB | create_options |
+--------------+---------------+------------+----------+----------------+
| PS_57        | test_compress |     100000 |       37 |                |
+--------------+---------------+------------+----------+----------------+

Next, we are going to compress the table with a KEY_BLOCK_SIZE=4 (this size was chosen arbitrarily, at no time is it indicated or decided whether or not it is the optimal value, in fact, it is not).

Shell
CentOS7-S1 PS_57> ALTER TABLE test_compress ROW_FORMAT=COMPRESSED,KEY_BLOCK_SIZE=4,ALGORITHM=INPLACE,LOCK=NONE;
Query OK, 0 rows affected (3.33 sec)

We verify the size of the table again (previously doing an ANALYZE TABLE with innodb_stats_persistent_sample_pages=100000 so that the statistics are as realistic as possible).

Shell
CentOS7-S1 PS_57> set global innodb_stats_persistent_sample_pages=100000;
Query OK, 0 rows affected (0.00 sec)
Shell
CentOS7-S1 PS_57> analyze table test_compress;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| PS_57.test_compress    | analyze | status   | OK       |
+------------------------+---------+----------+----------+
Query OK, 0 rows affected (0.00 sec)
Shell
CentOS7-S1 PS_57> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------------------------------+
| table_schema | table_name    | table_rows | TOTAL_MB | create_options                         |
+--------------+---------------+------------+----------+----------------------------------------+
| PS_57        | test_compress |     100000 |       19 | row_format=COMPRESSED KEY_BLOCK_SIZE=4 |
+--------------+---------------+------------+----------+----------------------------------------+

As you can see, the table has been compressed, let’s check its structure.

Shell
CentOS7-S1 PS_57> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint(20) unsigned NOT NULL,
 `identification_id` int(10) unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `INDEX_test_compress_result` (`result`),
 KEY `INDEX_test_compress_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
1 row in set (0.00 sec)

How do we reverse the compression? Good question. It should be easy, right?

(Clarification, before you ask, we are not going to judge how it should be done based on the size of the tables, what impact this ALTER would have on replication, and other things, the idea is to show how to reverse it and nothing else.)

We could use this command, let’s see:

Shell
CentOS7-S1 PS_57> ALTER TABLE test_compress ROW_FORMAT=DEFAULT,ALGORITHM=INPLACE,LOCK=NONE;
Query OK, 0 rows affected (6.25 sec)
Records: 0  Duplicates: 0  Warnings: 0

It seems to have worked! Uncompressed it:

Shell
CentOS7-S1 PS_57> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+--------------------+
| table_schema | table_name    | table_rows | TOTAL_MB | create_options     |
+--------------+---------------+------------+----------+--------------------+
| PS_57        | test_compress |     100000 |       25 | KEY_BLOCK_SIZE=4   |
+--------------+---------------+------------+----------+--------------------+

Better check:

Shell
CentOS7-S1 PS_57> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint(20) unsigned NOT NULL,
 `identification_id` int(10) unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `INDEX_test_compress_result` (`result`),
 KEY `INDEX_test_compress_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 KEY_BLOCK_SIZE=4

Something is wrong! the KEY_BLOCK_SIZE is still 4.

Second attempt:

Shell
CentOS7-S1 PS_57> ALTER TABLE test_compress ROW_FORMAT=DEFAULT,KEY_BLOCK_SIZE=0,ALGORITHM=INPLACE,LOCK=NONE;
Query OK, 0 rows affected (2.05 sec)
Records: 0  Duplicates: 0  Warnings: 0
Shell
CentOS7-S1 PS_57> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+--------------------+
| table_schema | table_name    | table_rows | TOTAL_MB | create_options     |
+--------------+---------------+------------+----------+--------------------+
| PS_57        | test_compress |     100000 |       25 |                    |
+--------------+---------------+------------+----------+--------------------+

Better check:

Shell
CentOS7-S1 PS_57> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint(20) unsigned NOT NULL,
 `identification_id` int(10) unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`) KEY_BLOCK_SIZE=4,
 KEY `INDEX_test_compress_result` (`result`) KEY_BLOCK_SIZE=4,
 KEY `INDEX_test_compress_timestamp` (`timestamp`) KEY_BLOCK_SIZE=4
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Something is wrong! Both the primary key and the secondary indexes continue to show KEY_BLOCK_SIZE=4.

Although when the table is converted from compressed to uncompressed, internally the KEY_BLOCK_SIZE of the indices honors that of the table, the CREATE TABLE statement does not. This would be an aesthetic/cosmetic issue at first, but it is a real problem when you do a dump since the CREATE TABLE is left with the KEY_BLOCK_SIZE values, which is not good.  Here is the output of mysqldump:

Shell
$ mysqldump -h localhost PS_57 test_compress --no-data > test_compress.sql
Shell
$ cat test_compress.sql
-- Table structure for table `test_compress`
DROP TABLE IF EXISTS `test_compress`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `test_compress` (
 `id` bigint(20) unsigned NOT NULL,
 `identification_id` int(10) unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`) KEY_BLOCK_SIZE=4,
 KEY `INDEX_test_compress_result` (`result`) KEY_BLOCK_SIZE=4,
 KEY `INDEX_test_compress_timestamp` (`timestamp`) KEY_BLOCK_SIZE=4
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

As you can see, there seems to be no way to reverse the KEY_BLOCK_SIZE in the table definition index-wise with a global ALTER TABLE command (if we can call it that), so we’ll make this last attempt:

Shell
ALTER TABLE test_compress
DROP PRIMARY KEY, add PRIMARY KEY (id),
DROP key INDEX_test_compress_result, add key INDEX_test_compress_result (result),
DROP key INDEX_test_compress_timestamp, add key INDEX_test_compress_timestamp (timestamp),
ROW_FORMAT=DEFAULT,KEY_BLOCK_SIZE=0,ALGORITHM=INPLACE,LOCK=NONE;

And now, it has the correct definition without the KEY_BLOCK_SIZE:

Shell
CentOS7-S1 PS_57> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint(20) unsigned NOT NULL,
 `identification_id` int(10) unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `INDEX_test_compress_result` (`result`),
 KEY `INDEX_test_compress_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Shell
CentOS7-S1 PS_57> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------+
| table_schema | table_name    | table_rows | TOTAL_MB | create_options |
+--------------+---------------+------------+----------+----------------+
| PS_57        | test_compress |     100000 |       25 |                |
+--------------+---------------+------------+----------+----------------+

Apparently, there are bugs in MariaDB related to the case:

https://jira.mariadb.org/browse/MDEV-26400

https://jira.mariadb.org/browse/MDEV-11757

A similar one in MySQL:

https://bugs.mysql.com/bug.php?id=56628

In MySQL 8, the situation is as follows:

Shell
CentOS7-S2 PS_8> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------+
| TABLE_SCHEMA | TABLE_NAME    | TABLE_ROWS | TOTAL_MB | CREATE_OPTIONS |
+--------------+---------------+------------+----------+----------------+
| PS_8         | test_compress |      31000 |       15 |                |
+--------------+---------------+------------+----------+----------------+

Let’s execute the ALTER to compress the table:

Shell
CentOS7-S2 PS_8> alter table test_compress ROW_FORMAT=COMPRESSED,KEY_BLOCK_SIZE=4,ALGORITHM=INPLACE,LOCK=NONE;
Query OK, 0 rows affected (4.54 sec)
Records: 0  Duplicates: 0  Warnings: 0

Let’s check again:

Shell
CentOS7-S2 PS_8> analyze table test_compress;
+-----------------------+---------+----------+----------+
| Table                 | Op      | Msg_type | Msg_text |
+-----------------------+---------+----------+----------+
| PS_8.test_compress    | analyze | status   | OK       |
+-----------------------+---------+----------+----------+
1 row in set (0.07 sec)
Shell
CentOS7-S2 PS_8> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------------------------------+
| TABLE_SCHEMA | TABLE_NAME    | TABLE_ROWS | TOTAL_MB | CREATE_OPTIONS                         |
+--------------+---------------+------------+----------+----------------------------------------+
| PS_8         | test_compress |     100000 |       19 | row_format=COMPRESSED KEY_BLOCK_SIZE=4 |
+--------------+---------------+------------+----------+----------------------------------------+
Shell
CentOS7-S2 PS_8> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint unsigned NOT NULL,
 `identification_id` int unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `INDEX_test_compress_result` (`result`),
 KEY `INDEX_test_compress_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
1 row in set (0.01 sec)

So far everything is the same as in MySQL 5.7: the KEY_BLOCK_SIZE persists in the definition of the entire table, but not of the indexes.

Fortunately, by running this command, we effectively reversed everything:

Shell
CentOS7-S2 PS_8> alter table test_compress ROW_FORMAT=DEFAULT, KEY_BLOCK_SIZE=0,ALGORITHM=INPLACE,LOCK=NONE;
Query OK, 0 rows affected (2.56 sec)
Records: 0  Duplicates: 0  Warnings: 0
Shell
CentOS7-S2 PS_8> show create table test_compress\G
*************************** 1. row ***************************
      Table: test_compress
Create Table: CREATE TABLE `test_compress` (
 `id` bigint unsigned NOT NULL,
 `identification_id` int unsigned DEFAULT NULL,
 `timestamp` datetime NOT NULL,
 `action` varchar(50) NOT NULL,
 `result` varchar(50) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `INDEX_test_compress_result` (`result`),
 KEY `INDEX_test_compress_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Shell
CentOS7-S2 PS_8> select table_schema, table_name, table_rows,  round(data_length / 1024 / 1024)+round(index_length / 1024 / 1024)+round(data_free / 1024 / 1024) TOTAL_MB, create_options from information_schema.tables where table_name='test_compress';
+--------------+---------------+------------+----------+----------------+
| TABLE_SCHEMA | TABLE_NAME    | TABLE_ROWS | TOTAL_MB | CREATE_OPTIONS |
+--------------+---------------+------------+----------+----------------+
| PS_8         | test_compress |     100000 |       25 |                |
+--------------+---------------+------------+----------+----------------+

As always, we recommend doing a preliminary test in a development environment before running it in production.

Conclusion

In MySQL 5.7, the only way to completely revert (at least in the definition of the table and its indexes) is to regenerate both the primary key and all its indexes.  It sounds like a terminal solution, but if you make backups using mysqldump (we always recommend using Percona XtraBackup for these purposes, it’s faster and more efficient) it is an issue to take into account since in its definition it preserves those erroneous definitions. Fortunately, this is fixed in MySQL 8.

Interesting, isn’t it?

Related


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK