5

PostgreSQL for MySQL DBAs Episode 7: Vacuuming Tables

 2 years ago
source link: https://www.percona.com/blog/postgresql-for-mysql-dbas-episode-7-vacuuming-tables/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

PostgreSQL for MySQL DBAs Episode 7: Vacuuming Tables

PostgreSQL for MySQL DBAs Vacuuming TablesExperienced MySQL DBAs exploring PostgreSQL are often shocked to learn that they will need to vacuum their tables. This is not using a Hoover brand-like device or letting a Roomba-like robot loose on the server. Vacuuming is a necessary part of maintenance on a PostgreSQL server.  This is why you need to watch Episode 7 for the details on vacuuming.

The documentation states that using VACUUM reclaims storage occupied by dead tuples (A tuple is PostgreSQL’s internal representation of a row in a table). Tuples that are deleted or obsoleted by an update are not physically removed from their table, so they remain present until a VACUUM is done. And it is necessary to do VACUUM periodically, especially on frequently-updated tables.

Using VACUUM

Let’s create some sample data that leaves old tuples hanging around, cluttering up our server.

Shell
test=# create table foo (id int, value int);
CREATE TABLE
test=# insert into foo values (1,1);
INSERT 0 1

And add some data, updating the same record repeatedly.

Shell
test=# update foo set value=2 where id =1;
UPDATE 1
test=# update foo set value=3 where id =1;
UPDATE 1
test=# update foo set value=4 where id =1;
UPDATE 1

And we can check to see the dead tuples.

Shell
test=# select relname, n_dead_tup from pg_stat_all_tables where relname = 'foo';
relname | n_dead_tup
---------+------------
foo | 3
(1 row)

Now, we can vacuum.

Shell
test=# VACUUM foo;
VACUUM
test=# select relname, n_dead_tup from pg_stat_all_tables where relname = 'foo';
relname | n_dead_tup
---------+------------
foo | 0
(1 row)

Why vacuum?

PostgreSQL’s VACUUM command has to process each table on a regular basis for several reasons:

  • To recover or reuse disk space occupied by updated or deleted rows.
  • To update data statistics used by the PostgreSQL query planner.
  • To update the visibility map, which speeds up index-only scans.
  • To protect against loss of very old data due to transaction ID wraparound.

The latter two bullet points will be new to MySQL DBAs.  Vacuum maintains a visibility map for each table to keep track of which pages contain only tuples that are known to be visible to all active transactions. This allows the vacuum to skip these tuples since they are known to be in a known ‘good’ state which speeds up vacuuming. And, it allows PostgreSQL to answer some queries using only the index (the corresponding indexes get vacuumed too), without reference to the underlying table. This provides for index-only scans which do not need to dive into the data to answer the query.

Wraparound transaction IDs are something to be worried about.  Postgres has a 32-bit transaction ID number that wraps around to zero and starts again. Compare this to MySQL’s running out of AUTO_INCREMENT numbers and just refusing to add more rows until corrected. When the wrap happens, old transactions that were in the past now appear to be in the future, and their output becomes invisible. This is a catastrophic data loss! Worse of all your data is safe but you have ZERO access to it.

To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions.

Autovacuum

PostgreSQL has an optional but highly recommended feature called autovacuum, whose purpose is to automate the execution of VACUUM and ANALYZE commands.  Most recent versions have this turned on but double-check as detailed below.  Autovacuum is highly configurable and keeps you from having to remember to run it.

Shell
test=# SHOW autovacuum;
autovacuum
------------
(1 row)

The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode oneepisode two, episode three, episode four, episode five, and episode six.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK