15

Locking SELECT with CREATE TABLE, INSERT INTO, and User Variables

 3 years ago
source link: https://mysql.wisborg.dk/2021/07/31/locking-select-with-create-table-insert-into-and-user-variables/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Locking SELECT with CREATE TABLE, INSERT INTO, and User Variables

Locking is an important concept in databases. They help regulate access to the data, so your SELECT queries return consistent results and DML and DDL statements leave the data and schema in a consistent state. For the data, there are four different transaction isolation levels that influence which locks are taken. The most two commonly used isolation levels are REPEATABLE READ (the default in InnoDB) and READ COMMITTED (the default in some other databases). Both of those are said to provide non-locking reads, but there is a little more to it than that.

Selecting into a user variable causing a lock wait timeout.Selecting into a user variable causing a lock wait timeout.

One case where reads are always locking is when you explicitly requests locks by adding the FOR SHARE or FOR UPDATE modifiers. However there are also cases where SELECT statements becomes locking due to the way the result of the statement is used. Two such cases are CREATE TABLE ... SELECT (INSERT INTO ... SELECT behaves the same way) and assigning the result of the statement to a user variable. Those two cases are the topic of this blog.

To make it easier to investigate the locks taken, a SLEEP(0.01) is inserted into the WHERE clause of the statements. The SLEEP() function returns 0 on success, so WHERE SLEEP(0.01) = 0 adds a 0.01 delay per row without changing the result of the query. The examples use the world database that can be downloaded from MySQL's page for other documentation.

CREATE TABLE .. SELECT / INSERT INTO … SELECT

First, you can investigate the locks taken by a CREATE TABLE ... SELECT and INSERT INTO ... SELECT statements. Since these two behaves the same for this example, only the CREATE TABLE version will be shown. The default transaction isolation level of REPEATABLE READ will be used:

Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
|                     56 |
+------------------------+
1 row in set (0.0004 sec)
Connection 1> CREATE TABLE world._tmp_city
SELECT * FROM world.city WHERE SLEEP(0.01) = 0;

Because of the SLEEP(0.01) in the WHERE clause, the statement will take around one minute to execute. The easiest way in MySQL 8.0 to determine which data locks are held by a statement is to use the performance_schema.data_locks table. In this case, the locks on the city table is those of interest, and you get get those like (notice that the thread id found above is used to only get the locks for that one connection):

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+------------+-----------+-----------+-------------+----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+------------+-----------+-----------+-------------+----------+
| NULL       | TABLE     | IS        | GRANTED     |        1 |
| PRIMARY    | RECORD    | S         | GRANTED     |      525 |
+------------+-----------+-----------+-------------+----------+
2 rows in set (0.0042 sec)

The number of locks depend on how far into the execution the CREATE TABLE ... SELECT statement is. The outpuf rom the data_locks table shows there is an shared intention lock on the table and 525 shared record locks on the primary key (in InnoDB that is the row itself). At the end of the statement – just before the statement completes, all rows will be locked. What does that mean? If you try to update or delete one of the rows that are locked in a different connection, that statement will block and possibly encounter a lock wait timeout:

Connection 2> SET SESSION innodb_lock_wait_timeout = 2;
Query OK, 0 rows affected (0.0006 sec)
mysql> UPDATE world.city
SET Population = Population + 1
WHERE ID = 1;
ERROR: 1205: Lock wait timeout exceeded; try restarting transaction

In this case, the timeout is set to happen after two seconds to avoid having to wait for the error to occur.

So, this was with the REPEATABLE READ transaction isolation level. Is there a difference in READ COMMITTED? Yes, there is. In that case the CREATE TABLE ... SELECT and INSERT INTO ... SELECT statements do not take any locks on the city table.

The READ COMMITTED transaction isolation level takes fewer locks than REPEATABLE READ. This includes the case of CREATE TABLE ... SELECT and INSERT INTO ... SELECFT where the SELECT part is non-locking.

MySQL Concurrency

If you want to learn more about locks and transactions in MySQL, then I have written MySQL Concurrency published by Apress. The book covers monitoring of locks and transactions, the various lock levels and types, how lock conflicts work, and six case studies investigating and reducing lock situations.

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

User Variables

The other use case, I want to discuss is storing the result of a SELECT statement in a user variable. This has an unexpected twist. As a start, consider finding the number of rows in the table and store the result in the @my_var user variable using the default REPEATABLE READ transaction isolation level:

Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
|                     56 |
+------------------------+
1 row in set (0.0005 sec)
Connection 1> SET SESSION transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.0004 sec)
Connection 1> SET @my_var = (SELECT COUNT(*)
FROM world.city
WHERE SLEEP(0.01) = 0);

Selecting from the performance_schema.data_locks while the query is executing, gives a result similar to before:

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+-----------+-------------+----------+
| INDEX_NAME  | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+-------------+-----------+-----------+-------------+----------+
| NULL        | TABLE     | IS        | GRANTED     |        1 |
| CountryCode | RECORD    | S         | GRANTED     |      248 |
+-------------+-----------+-----------+-------------+----------+
2 rows in set (0.0027 sec)

The optimiser in this case chooses to use the CountryCode index for counting the number of rows, but otherwise the locking situation is the same as before. It does however mean that you can update the row provided you do not use the CountryCode index to access the row, so in that sense it is an improvement. (The optimiser will choose the execution plan it thinks is most efficient – has the lowest cost – so you mileage may vary.)

However, the big surprise is when you change to the READ COMMITTED transaction isolation level:

Connection 1> SET SESSION transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.0005 sec)
Connection 1> SET @my_var = (SELECT COUNT(*)
FROM world.city
WHERE SLEEP(0.01) = 0);

Unlike before, assigning the value to a user variable still takes locks in the READ COMMITTED transaction isolation level:

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+---------------+-------------+----------+
| INDEX_NAME  | LOCK_TYPE | LOCK_MODE     | LOCK_STATUS | COUNT(*) |
+-------------+-----------+---------------+-------------+----------+
| NULL        | TABLE     | IS            | GRANTED     |        1 |
| CountryCode | RECORD    | S,REC_NOT_GAP | GRANTED     |      271 |
+-------------+-----------+---------------+-------------+----------+
2 rows in set (0.0023 sec)

The locking is slightly reduced compared to REPEATABLE READ (no gap locks) but this is still enough to cause serious locking problems in a production system.

Warning

Assigning the result of a SELECT statement to a user variable can even in the READ COMMITTED transaction isolation level cause severe locking issues. The locks are held till the end of the transaction.

Why does it takes locks even in READ COMMITTED when you assign the result to a user variable? I have not been able to find any documentation of it, but by assigning the value to a variable, you can re-use the value including in statements that modify the data, so for data consistency it does make sense that the locks last till the end of the transaction.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK