12

Session Temporary Tablespaces and Disk Space Usage in MySQL

 2 years ago
source link: https://www.percona.com/blog/session-temporary-tablespaces-and-disk-space-usage/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Session Temporary Tablespaces and Disk Space Usage in MySQL

Session Temporary Tablespaces and Disk Space Usage in MySQLTemporary Tables handling in MySQL has a somewhat checkered past. Some time ago, my colleague Przemek wrote a post about differences in the implementations between versions of MySQL. If you are looking for the differences in implementation, that’s a perfect place to check.

In this post, I’d like to focus on Session Temporary Tablespaces, InnoDB on-disk internal temporary tables that come to play when a query requires storing more data than tmp_table_size or TempTable engine allocated more than temptable_max_mmap of memory-mapped temporary files.

If that condition happens, a session executing the query needs to be allocated with a tablespace from a pool of temporary tablespaces. The pool initially contains 10 temporary tablespaces that are created when an instance is started. The size of the pool never shrinks, and tablespaces are added to the pool automatically if needed. The default size of the tablespace (IBT file) is five pages or 80 KB.

When a session disconnects, its temporary tablespaces are truncated and released to the pool with their initial size. The truncate operation happens ONLY when the session disconnects; this also means that as long as the session is connected, the tablespaces consume disk space. MySQL can reuse the area for future queries, but if one query requires a lot of temporary space, the IBT will remain huge for the whole lifetime of a session.

Let’s demonstrate that behavior. Firstly, let’s create a table and populate it with one million rows. As a side note, storing SHA1 checksums in a char(40) field isn’t perfect, but it makes the test cleaner.

Preparing the test data set
Shell
mysql> CREATE TABLE table01 (id int unsigned primary key auto_increment, s char(40));
mysql> INSERT INTO table01 (s) SELECT sha1(FLOOR(RAND() * 100))  from sequence_table(1000000) t;

Then, the below query that’s using temporary tables is going to be executed. As the temporary table size is larger than a value of tmp_table_size, which is 16 MB by default, that query will have to use the Session Temporary Tablespace.

Executing the query
Shell
mysql> pager pt-visual-explain
mysql> explain SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
Table scan
rows           1991860
+- DERIVED
   table          derived(temporary(union(table01,table01)))
   +- Table scan
      +- TEMPORARY
         table          temporary(union(table01,table01))
         +- Filesort
            +- Table scan
               +- UNION
                  table          union(table01,table01)
                  +- Table scan
                  |  rows           995930
                  |  +- Table
                  |     table          table01
                  +- Table scan
                     rows           995930
                     +- Table
                        table          table01
mysql> SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
+--------+------------------------------------------+
| id     | s                                        |
+--------+------------------------------------------+
| 999145 | fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f |
+--------+------------------------------------------+

Let’s check the temporary tablespaces attached to that session after executing the query:

Listing temporary tablespaces
Shell
mysql> SELECT PATH, format_bytes(SIZE), STATE, PURPOSE FROM INFORMATION_SCHEMA.INNODB_SESSION_TEMP_TABLESPACES WHERE id = CONNECTION_ID();
+----------------------------+--------------------+--------+-----------+
| PATH                       | format_bytes(SIZE) | STATE  | PURPOSE   |
+----------------------------+--------------------+--------+-----------+
| ./#innodb_temp/temp_10.ibt | 392.00 MiB         | ACTIVE | INTRINSIC |
+----------------------------+--------------------+--------+-----------+
# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 392M 05-29 14:10 temp_10.ibt

After closing the session the file size is back to its default value:

File size after closing the connection
Shell
# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 80K 05-29 14:10 temp_10.ibt

The solution works as described in the MySQL manual, however, it also means that it can be problematic. Long-living sessions aren’t uncommon in the databases world, primarily if applications use connection pooling. Moreover, connection pooling was designed precisely for that purpose, to mitigate the overhead of creating a new connection each time an application needs it, as reusing already existing connections to a database is more efficient than opening a new connection.

For instance, if between an application and MySQL, ProxySQL is used as middleware, the application lacks most of the control of the backend connection. It means that balanced connections to the backend are going to be used, but also, the connections will most likely live forever. Because of that fact, MySQL will rarely reclaim space from Session Temporary Tablespace, and this, as a result, will increase the overall disk space utilization.

It’s not that hard to hit. One heavy OLAP query from time to time it’s enough to start the snowball.

ProxySQL has an option that forces backend connection to be re-initialized – mysql-connection_max_age_ms (disabled by default). However, this works only for ProxySQL, and there are plenty more Connection Poolers available, and some of them are implemented directly on the applications’ side, which in general increases the complexity of the whole problem.

In an ideal world, MySQL should handle this issue by triggering the truncate process more often. I’ve submitted a Feature Request with the goal of having the “surprise” factor removed, and ultimately making it more user-friendly.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK