6

Segmentation Fault - A DBA Perspective

 1 year ago
source link: https://www.percona.com/blog/segmentation-fault-a-dba-perspective/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Segmentation Fault – A DBA Perspective

June 20, 2023

Ninad Shah

On occasion, DBAs come across segmentation fault issues while executing some queries. However, this is one of the least-explored topics to date. I tried to search for details related to segmentation fault on the internet and found many articles; however, it failed to quench my thirst as none of them had the answer I was looking for. So, I decided to gather information and write a detailed blog about this issue.

In order to understand “segmentation fault,” it is a must to know the basic idea of segmentation and its implementation in C programming. In this blog, I will also cover a scenario that causes “segmentation fault.”

Basic understanding

In order to understand segmentation fault, it is necessary to understand memory management methods for processes.

When we need to execute any program, it should be loaded into memory first. Inside the memory, they can be allocated any available space. When a program leaves the memory, space becomes available; however, the OS may or may not be able to allocate vacant memory space to another program or process as it has some issues. As the amount of space required by the new program may be higher than the space available in a fragment, the program should be broken into different chunks before it is loaded into memory, due to which memory management becomes challenging because it leads to fragmentation.

In order to overcome these issues, the concept of paging and segmentation was introduced, where physical address space and virtual address space were designed. A detailed description of these concepts is below.

Paging

This was designed to allow non-contiguous space allocation to processes. Here, memory is divided into equal sizes of partitions where the code of a program resides. The chunks in the main memory are called frames, while they are called pages in the secondary(or HDD). In order to handle memory management, a structure called memory management unit(MMU) is built, which divides memory blocks into major sections: logical address space and physical address space.

Logical address space: it comprises logical addresses that are generated by the CPU for the program.

Physical address space: it has physical addresses that are pointers to actual locations in memory.

In order to perform the actual translation of a logical address to the physical address, MMU needs to perform memory mapping operations, which can be accomplished by another structure called a page table. A page table has actual references to relevant physical addresses for logical addresses.

The figure below describes the same.

page table

Segmentation

This scheme was introduced to overcome the disadvantages of paging; it works similarly to paging, though. Instead of fixed-size pages, it creates different sizes of segments that are based on program code. In this case, we do not need physical address space. Here, a segment table manages everything.

Here, virtual(logical) to physical address translation is much easier as segment tables store adequate information.

I will not dive into this topic further as it requires a bit more technical understanding. The purpose of adding this section was to have a basic understanding of mapping from logical to physical addresses.

Screenshot-2023-06-19-at-9.22.31-PM-1024x690.png

What is segmentation fault?

As explained above, the CPU first fetches a logical address, and by using a page table or a segment table, it finds/calculates the physical address of the desired memory location. That is how memory management works.

In an attempt to access the desired location, we sometimes come across some issues that are described below.

  • Occasionally, after calculating the physical address using a page/segment table, the program comes across the issue that required contents (piece of code, variables, or anything else) are unavailable in the physical memory location. This phenomenon is called “page fault.” This is not unusual and doesn’t affect the course of the execution as it just loads desirable items in memory.
  • Another one is a classical case of an inaccessible memory location. When the generated physical address points to a physical location that is not accessible by the program. This is called “segmentation fault,” which terminates the process execution. This happens when a program tries to access a read-only portion of memory or another program’s space.

Although the segmentation fault has been maligned as a showstopper, it is still mandatory as it is a mechanism to provide protection against any internal corruption.

Note: segmentation fault has nothing to do with the segmentation memory management method.

A reproducible scenario

While exploring at the code level, there are a number of scenarios that result in a segmentation fault, such as buffer overflows, stack overflows, and so on. However, this blog is written from the database perspective; hence, I would not prefer to dive into those scenarios as they are very high-level programming concepts.

In this section, I will focus on a scenario in the PostgreSQL database that causes segmentation fault.

This is the one that I came across once where the database gets restarted due to “segmentation fault.” Below is a line of code that results in a segmentation fault on PostgreSQL 13.4 and 12.8.

CREATE SCHEMA debug;
CREATE TABLE debug.downloaded_images (
itemid text NOT NULL,
download_time timestamp,
PRIMARY KEY(itemId)
INSERT INTO debug.downloaded_images (itemid, download_time) VALUES ('1190300','2021-09-07 11:00:10.255831');
BEGIN;
CREATE TABLE IF NOT EXISTS "debug"."foo"
(itemId TEXT,
last_update TIMESTAMP,
PRIMARY KEY(itemId)
DECLARE "test-cursor-crash" CURSOR WITH HOLD FOR
SELECT di.itemId FROM "debug".downloaded_images di
LEFT JOIN (SELECT itemId, MIN(last_update) as last_update FROM
"debug"."foo" GROUP BY itemId) computed ON di.itemId=computed.itemId
WHERE COALESCE(last_update, '1970-01-01') < download_time;
FETCH 10000 IN "test-cursor-crash";
COMMIT;

The above example is taken from here. By doing some further analysis, it came to light that it creates issues with LEFT JOIN only. In the case of an equi-join, it works as expected. This error was fixed in later versions of PostgreSQL.

Causes

As described above, the actual cause of this error is trying to access a memory address that is not accessible by the program, and there are various reasons for the same to happen. However, sophisticated users have a limited understanding of such concepts, and due to that, I will try to explain them in the simplest possible terms.

The following are possible causes for segmentation fault:

  • Operating system issues
  • Buggy OS kernel
  • Faulty hardware(specifically memory)
  • Bug in a product (e.g., PostgreSQL, MySQL)
  • Database corruption

Though the scope of this error is not limited to the above-mentioned reasons only, these are the most probable ones. In order to know the root cause of the issue, one needs to troubleshoot it with the help of programmers.

Troubleshooting

To delve into the root cause of segmentation fault, it is imperative to install debug symbols and enable the creation of a core dump on failure. This helps analyze the issue and shows what function or part of the code causes the issue. If requirements are not met, it is not able to generate the core dump, and it becomes impossible to trace the issue.

Enable core dump generation

Every database has different methods to generate core dump files. In order to enable the generation of core dump, one needs to set some kernel settings as below.

Here, any other path can be used instead of /var/crash.

# echo 'kernel.core_pattern=/var/crash/core-%e-%p' >> /etc/sysctl.conf
# ulimit -c unlimited

Enable debugging

Debug symbols enable code-level debugging. It shows details about the file being executed and the line of the code where the execution is happening. It is the responsibility of software developers to build debug symbols. In PostgreSQL, debug symbols can be enabled at the time of installation as below.

# ./configure CFLAGS="-O0 -g3

Also, there are certain packages available in PostgreSQL, such as postgresql-12-dbg. 

In the case of MySQL, the following command during the source code installation may turn on debugging.

# cmake -DWITH_DEBUG=1

Allow the database to generate core dumps

After enabling the core dump generation and debugging, it is important that databases should also collaborate with the host OS to generate a core dump. Hence, the database should be started with an option to create core files. In order to accomplish this, one should start the database with such an option.

In the case of PostgreSQL, the pg_ctl command should be started with the -c option, as shown below.

$ /usr/local/pgsql/bin -D <data_directory> -c start

Iin MySQL, the following lines can be added in my.cnf or my.ini.

[mysqld]
core-file

Note: In an event of a crash, the OS dumps all the contents from memory in the core file. So, before enabling, be sure you have sufficient space to accommodate the core dump.

Debugging core files

Core files are version specific, and they can be read with the binary of a specific version of the database. Another version’s binary file cannot read the core file generated by the current version of the database. Like, the core file generated by MySQL 8 cannot be read by the MySQL binary from any other version. 

The core dump can be traced by Gnu debugger(gdb). Below is an example of reading the core dump.

$ gdb /usr/local/pgsql/bin/postgres /var/crash/core-postgres-64807
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
Reading symbols from /usr/local/pgsql/bin/postgres...
[New LWP 64807]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: postgres postgres [local] COMMIT'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 slot_deform_heap_tuple (natts=2, offp=0x5560dcfd1d58, tuple=<optimized out>, slot=0x5560dcfd1d10) at execTuples.c:930
930 execTuples.c: No such file or directory.
(gdb)

Apart from that, Valgrind is also one of the tools that can be used to debug the issue. To learn more about Valgrind, check out Profiling MySQL Memory Usage With Valgrind Massif.

Percona’s initiative

As it is described, a segmentation fault is caused by various issues that are sometimes not even in the control of programmers. But in many cases, programs themselves are culprits and trigger segmentation faults; however, users have the least knowledge of the same. Percona is committed to strengthening the open source community and has acknowledged the issue. The Percona team strongly believes that users should have knowledge of the perils associated with some non-standard modules (or PostgreSQL extensions) that are identified as troublemakers.

These details are planned to be added in pg_gather reports. At present, this is in the development phase. The next version of the pg_gather will have these details available.

Summary

Indeed, segmentation fault is an issue that is not widely explored yet. Having said that, it frequently revisits database systems due to a variety of reasons. Basically, it surfaces due to an attempt to access an unauthorized area or segment of memory where a normal DBA is least aware of the same. The issue can be troubleshot by enabling core dump generation and installation of debug symbols.

Share This Post!

Subscribe
Connect with
guest
Label
0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK