7

Syscalls Analysis in MySQL When Using innodb_flush_method and innodb_use_fdatasy...

 7 months ago
source link: https://www.percona.com/blog/syscalls-analysis-in-mysql-when-using-innodb_flush_method-and-innodb_use_fdatasync/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

January 18, 2024

Vinicius Grippa

In this blog post, we will discuss how to validate at the operating system level the effects of changing the innodb_flush_method to variations other than the default (particularly for O_DIRECT which is most commonly used) and the use of innodb_use_fdatasync.

Introduction

First, let’s define what the innodb_flush_method parameter does. It dictates how InnoDB manages the flushing of data to disk. I won’t detail what each valid value does, but you can check the documentation link here. The list of possible values is detailed below (Unix only):

  • fsync
  • O_DSYNC
  • littlesync
  • nosync
  • O_DIRECT
  • O_DIRECT_NO_FSYNC

As said, we will focus on the O_DIRECT. As part of the best practices, we recommend using O_DIRECT to avoid double-buffering, bypassing the OS cache, and thus improving performance when writing data. Below is the InnoDB architecture extracted from the official documentation:

InnoDB architecture

On platforms that support fdatasync() system calls, the innodb_use_fdatasync variable, introduced in MySQL 8.0.26, permits innodb_flush_method options that use fsync() to use fdatasync() instead. An fdatasync() system call does not flush changes to file metadata unless required for subsequent data retrieval, providing a potential performance benefit.

Because I mentioned the term system call (or syscalls), let’s define it since it is an essential point of this blog post.

To manipulate a file, MySQL and any other software must invoke syscalls. Whenever a process requires a system resource, it sends a request for that resource to the kernel by making a system call. At a high level, system calls are “services” offered by the kernel to user applications. They resemble library APIs, described as function calls with a name, parameters, and return value. The diagram below is a high-level illustration of this process:

syscalls

Question: Why not directly access the resource we want (memory, disk, etc..)? 

This is because Linux divides the execution of the process into two spaces.  User-run processes (generally referred to as user space processes) rely on services provided by the kernel. The kernel is a particular part of the operating system that handles various low-level operations in a privileged running mode. The concept of User and Kernel space is described in detail here. System security and stability would be compromised if applications could directly read and write to the kernel’s address space. In the given scenario, one process is capable of accessing the memory area of another process. This suggests a potential issue with memory isolation and could lead to security vulnerabilities.

Question: How do I check if my Operating System supports a specific syscall?

You can use the command:

$ man syscalls

It will list the syscalls available and in which Linux Kernel appeared.

Test case

We will use the strace utility and the information presented in /proc/<pid>/fdinfo/<fdinfo> to prove the theory described before. First, I will start a MySQL 8.0.33 instance with default settings. 

We can list the files opened by the mysqld process by checking the /proc/<pid>/fd/:

$ ls -l /proc/12006/fd/
total 0
lr-x------. 1 vinicius.grippa percona 64 Jan 15 16:59 0 -> /dev/null
l-wx------. 1 vinicius.grippa percona 64 Jan 15 16:59 1 -> /home/vinicius.grippa/sandboxes/msb_8_0_33/data/msandbox.err
lrwx------. 1 vinicius.grippa percona 64 Jan 15 16:59 9 -> /home/vinicius.grippa/sandboxes/msb_8_0_33/data/#ib_16384_1.dblwr

We can check each file descriptor by running cat /proc/<pid>/fdinfo/<file descriptor number>:

$ cat /proc/12006/fdinfo/9
pos: 0
flags: 0100002
mnt_id: 69
lock: 1: POSIX  ADVISORY  WRITE 12006 fd:06:32640985 0 EOF

We are interested in the flags description, represented by the octal number 0100002. To interpret the flags, we can use the fdflags repository from GitHub or the command below in the shell:

$ for flag in APPEND ASYNC CLOEXEC CREAT DIRECT DIRECTORY DSYNC EXCL LARGEFILE NOATIME  NOCTTY NOFOLLOW NONBLOCK PATH RDWR SYNC TMPFILE TRUNC; do printf '%s: ' O_${flag}; echo O_${flag} | gcc -D_GNU_SOURCE -include fcntl.h -E - | tail -n 1; done

And using the fdflags project to avoid manual work:

fdinfo_directory="/proc/46211/fdinfo/"
fd_directory="/proc/46211/fd/"
for fd in $(ls ${fdinfo_directory}); do
    echo "Processing file descriptor ${fd}" &&
    # Read the symbolic link to find out the file name
    file_name=$(readlink "${fd_directory}${fd}")
    echo "File Name: $file_name"
    ./fdflags "${fdinfo_directory}${fd}"        
# Output
Processing file descriptor 5
File Name: /home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest79.ibd
O_LARGEFILE
O_RDWR
Processing file descriptor 99
File Name: /home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest89.ibd
O_LARGEFILE
O_RDWR

The output shows the file descriptor number, the file name, and the flags applied to it when it opened.

Next, we can confirm MySQL is using fsync() to write data with strace:

# Attaching strace to the mysqld process
$ strace -f -c  -o ./strace.out -p <pid>
# Once you exit strace (CTRL+C), a summary of syscalls is written in the strace.out file
$ cat strace.out
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.38  222.980851        1158    192526     50579 futex
 27.50  135.114996      134309      1006           io_getevents
  5.90   28.997398     1933159        15        12 restart_syscall
  5.69   27.970912      537902        52           nanosleep
  5.32   26.117827         122    213323      2088 read
  3.86   18.968682         172    109744           write
  3.00   14.731474     1133190        13           epoll_wait
  1.13    5.565184         574      9688           fsync
  0.82    4.050227         134     30132           clock_gettime
  0.61    2.974088         148     20089           pwrite64
  0.47    2.311729         708      3264           fdatasync
  0.00    0.000270         135         2           rename
  0.00    0.000256          32         8           close
------ ----------- ----------- --------- --------- ----------------
100.00  491.324073                590620     52679 total

Even without enabling the innodb_use_fdatasync, you will notice fdatasync() syscall in the strace output. The fdatasync() syscall is used by default by the binary logs when sync_binlog > 0. We can confirm in strace:

$ strace -f  -s2048  -yy  -o ./strace.out -p <pid>
47252 fdatasync(70</home/vinicius.grippa/sandboxes/msb_8_0_33/data/binlog.000026> <unfinished ...>

Suggestion: Try setting sync_binlog=0 and check if the fdatasync()syscall is still requested by MySQL for the binary logs.

Now, we are going to add the following settings to MySQL and restart the instance:

[mysqld]
innodb_flush_method=O_DIRECT
innodb_use_fdatasync = ON

Checking again, we can see that a new flag, O_DIRECT, was added to the files:

Processing file descriptor 96
File Name: /home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest112.ibd
O_LARGEFILE
O_RDWR
O_DIRECT
Processing file descriptor 99
File Name: /home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest94.ibd
O_LARGEFILE
O_RDWR
O_DIRECT

And checking with strace, we will see our table files(*.ibd) using fdatasync():

$ strace -f  -s2048  -yy  -o ./strace.out -p 20498
20551 fdatasync(48</home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest23.ibd> <unfinished ...>
20551 fdatasync(26</home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest5.ibd> <unfinished ...>
20550 fdatasync(34</home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest15.ibd> <unfinished ...>
20550 fdatasync(24</home/vinicius.grippa/sandboxes/msb_8_0_33/data/test/sbtest1.ibd> <unfinished ...>

Conclusion

We investigated the technical nuances of InnoDB’s data-flushing mechanisms and how they interact with the operating system. We can understand the details of optimizing MySQL performance when adjusting the innodb_flush_method parameter and the innodb_use_fdatasync.

Our experiments with the strace utility and examining the file descriptors in /proc/<pid>/fdinfo/  have provided concrete evidence of the behavior changes when these settings are tweaked. The use of O_DIRECT can lead to more efficient data writing operations. Additionally, the introduction of innodb_use_fdatasync in MySQL 8.0.26 and its preference over fsync() in specific scenarios illustrate the ongoing evolution of MySQL to exploit specific system call advantages for performance gains.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Share This Post!

Subscribe
Connect with
guest
0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK