2

Duplicate, Redundant, and Invisible Indexes

 1 year ago
source link: https://www.percona.com/blog/duplicate-redundant-and-invisible-indexes/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Duplicate, Redundant, and Invisible Indexes

Invisible Indexes MySQLMySQL index is a data structure used to optimize the performance of database queries at the expense of additional writes and storage space to keep the index data structure up to date. It is used to quickly locate data without having to search every row in a table. Indexes can be created using one or more columns of a table, and each index is given a name. Indexes are especially useful for queries that filter results based on columns with a high number of distinct values.

Indexes are useful for our queries, but duplicate, redundant, and unused indexes reduce performance by confusing the optimizer with query plans, requiring the storage engine to maintain, calculate, and update more index statistics, as well as requiring more disk space.

Since MySQL 8.0, indexes can be marked as invisible. In this blog, I will detail how to detect duplicate and underused indexes as well as the new feature of invisible indexes and how it can help with index management.

How to find the duplicate indexes?

pt-duplicate-key-checker is a command-line tool from Percona Toolkit that scans a MySQL database and identifies tables that have duplicate indexes or primary keys. It can help identify potential problems with the schema and provide guidance on how to fix them. For each duplicate key, the tool prints a DROP INDEX statement by default, allowing you to copy-paste the statement into MySQL to get rid of the duplicate key.

For example:

Shell
$ pt-duplicate-key-checker --host=localhost --user=percona --ask-pass
Enter password:
# ########################################################################
# mytestdb.authors_test
# ########################################################################
# idx_first_name is a left-prefix of idx_first_name_last_name
# Key definitions:
#   KEY `idx_first_name` (`first_name`),
#   KEY `idx_first_name_last_name` (`first_name`,`last_name`),
# Column types:
#   `first_name` varchar(50) default null
#   `last_name` varchar(50) default null
# To remove this duplicate index, execute:
ALTER TABLE `mytestdb`.`authors_test` DROP INDEX `idx_first_name`;
# Key idx_last_name_id ends with a prefix of the clustered index
# Key definitions:
#   KEY `idx_last_name_id` (`last_name`,`id`)
#   PRIMARY KEY (`id`),
# Column types:
#   `last_name` varchar(50) default null
#   `id` int not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `mytestdb`.`authors_test` DROP INDEX `idx_last_name_id`, ADD INDEX `idx_last_name_id` (`last_name`);
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   4125830
# Total Duplicate Indexes  2
# Total Indexes            129

How to find the unused indexes?

Unused indexes are indexes that are created in a database but are not being used in any queries. These indexes take up space in the database and can slow down query performance if they are not maintained. Unused indexes should be identified and removed to improve the performance of the database.

We can use the sys.schema unused indexes view to see which indexes have not been used since the last time the MySQL server was restarted.

Example :

Shell
mysql> select * from sys.schema_unused_indexes
       where index_name not like 'fk_%' and object_schema='mytestdb' and object_name='testtable';
+---------------+-------------+----------------+
| object_schema | object_name | index_name     |
+---------------+-------------+----------------+
| mytestdb      | testtable   | idx_first_name |
| mytestdb      | testtable   | idx_last_name  |
+---------------+-------------+----------------+
2 rows in set (0.00 sec)

Since the index idx_first_name was used to retrieve the records, it is not listed.

Shell
mysql> select last_name from testtable WHERE first_name=’Arun’;
+-----------+
| last_name |
+-----------+
| Jith      |
+-----------+
1 rows in set (0.00 sec)
mysql> select * from sys.schema_unused_indexes
       where index_name not like 'fk_%' and object_schema='mytestdb' and object_name='testtable';
+---------------+-------------+---------------+
| object_schema | object_name | index_name    |
+---------------+-------------+---------------+
| mytestdb      | testtable   | idx_last_name |
+---------------+-------------+---------------+
1 row in set (0.01 sec)

Invisible Indexes

In MySQL 8.0, there is a feature that allows you to have an invisible index. This means that an index is created on a table, but the optimizer does not use it by default. By using this feature, you can test the impacts of removing an index without actually dropping it. If desired, the index can be made visible again, therefore avoiding the time-consuming process of re-adding the index to a larger table.

The SET_VAR(optimizer_switch = ‘use_invisible_indexes=on’) allows the invisible index to be used for specific application activities or modules during a single query while preventing it from being used across the entire application.

Let’s look at a simple example to see how it works.

Shell
CREATE TABLE `authors` (
  `id` int NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50),
  `last_name` varchar(50),
  `email` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
   PRIMARY KEY (`id`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB;

The indexes are set to visible by default, and we may use the ALTER TABLE table name ALTER INDEX index name to make it INVISIBLE/VISIBLE.

To find the index details, we can use “SHOW INDEXES FROM table;” or query the INFORMATION SCHEMA.STATISTICS.

Shell
mysql> SELECT INDEX_NAME, IS_VISIBLE
       FROM INFORMATION_SCHEMA.STATISTICS
       WHERE TABLE_SCHEMA = 'mytestdb' AND TABLE_NAME = 'authors_test';
+------------+------------+
| INDEX_NAME | IS_VISIBLE |
+------------+------------+
| email      | YES        |
| PRIMARY    | YES        |
+------------+------------+
2 rows in set (0.00 sec)
mysql> ALTER TABLE authors_test ALTER INDEX email INVISIBLE;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> SELECT INDEX_NAME, IS_VISIBLE
       FROM INFORMATION_SCHEMA.STATISTICS
       WHERE TABLE_SCHEMA = 'mytestdb' AND TABLE_NAME = 'authors_test';
+------------+------------+
| INDEX_NAME | IS_VISIBLE |
+------------+------------+
| email      | NO         |
| PRIMARY    | YES        |
+------------+------------+
2 rows in set (0.00 sec)

The query cannot utilize the index on the email column since it is invisible.

Shell
mysql> explain select email from authors_test WHERE email='[email protected]'\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: authors_test
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10063
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

The SET VAR optimizer hint can be used to temporarily update the value of an optimizer switch and enable invisible indexes for a single query only, as shown below:

Shell
mysql> explain select /*+ SET_VAR(optimizer_switch = 'use_invisible_indexes=on') */ first_name
               from authors_test WHERE email='[email protected]'\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: authors_test
   partitions: NULL
         type: const
possible_keys: email
          key: email
      key_len: 302
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

Please refer to the documentation for more information on invisible indexes.

Summary

Removal of duplicate and unnecessary indexes is suggested to prevent performance degradation in high concurrent workloads with less-than-ideal data distribution on a table. Unwanted keys take up unnecessary disc space, which can cause overhead to DML and read queries. To test the effects of dropping an index without fully removing it, it can be made invisible first.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK