4

PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased)

 11 months ago
source link: https://www.percona.com/blog/postgresql-partitioning-made-easy-using-pg_partman-timebased/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased)

September 27, 2023

Neha Korukula

PostgreSQL Partition Manager — ‘pg_partman’ — is an open source extension freely available, widely supported, and actively maintained by the PostgreSQL community.

pg_partman creates and manages both time-based and serial-based table partition sets. Users use it to automate and create partitions by breaking large tables into smaller partitions for better performance.

In this post, initially, we will discuss the creation of the pg_partman extension following partitioning for newly created tables. First, we will need to install the OS with the necessary control files for the appropriate PostgreSQL version for installing pg_partman on the database.

Installation of pg_partman on the database

I recommend creating the schema before creating the extension to install the extension’s objects. This step is not mandatory, but after installation, you cannot change it. By default, the extension creates objects in the ‘public’ schema.

Establishing a dedicated schema for an extension serves the purpose of storing its objects, including tables, functions, procedures, and more. This approach facilitates smoother management and simplifies tasks like handling backups, upgrades, and migrations.

Below, I present the steps and failures encountered when attempting to change the schema after creating the extension.

Step one: As an example,  let us try to create an extension without Qualifying SCHEMA.

CREATE EXTENSION pg_partman;
postgres=# dx
                         List of installed extensions
Name    | Version |   Schema   |                     Description                      
------------+---------+------------+------------------------------------------------------
pg_partman | 4.6.0   | public     | Extension to manage partitioned tables by time or ID

Step two: Next, we demonstrate a failure to set up the schema for an already created extension.

CREATE schema testpartman;
ALTER EXTENSION pg_partman set schema testpartman ;
ERROR:  extension "pg_partman" does not support SET SCHEMA

Eventually, here are the steps for Installation of pg_partman along with schema.

Step 1: Create schema for installation of the extension

partman=# CREATE SCHEMA partman;
CREATE SCHEMA

Step 2: Create extension pg_partman in schema partman

partman=# CREATE EXTENSION pg_partman SCHEMA partman;
CREATE EXTENSION
partman=# dx
                          List of installed extensions
   Name    | Version |   Schema   |                     Description                      
------------+---------+------------+--------------------------------------------------
 pg_partman | 4.6.0   | partman    | Extension to manage partitioned tables by time or ID

Implementation of native partitioning

Native partitioning is also known as declarative partitioning. PostgreSQL allows you to declare that a table is divided into partitions with a list of columns or expressions to be used as the partition key.

However, you can only implement range partitioning using pg_partman, either with time-based or serial-based methods. This is because you can only predict when a new partition needs to be created in the case of range partitioning on dates or timestamps.

Time-based partitioning by range with template

Unique constraints on the partitioned table must include all partitioning columns. Extension pg_partman helps to manage this by using a template table to manage properties that currently are not supported by native partitioning.

Properties like INDEX and unique constraint creations on the template table are created on all partitions.

partman=# CREATE TABLE employees (id INT PRIMARY KEY, fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL ,joined DATE) PARTITION BY RANGE (joined);
ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "employees" lacks column "joined" which is part of the partition key.

1. First, create a table with native partitioning type by range using DATE typed column

<span style="font-family: 'Courier 10 Pitch', Courier, monospace; font-size: 11.2px; background-color: #cce4f5; color: #666666;">partman=# CREATE TABLE employees  (id INT, fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL ,joined DATE NOT NULL) PARTITION BY RANGE (joined);
</span>CREATE TABLE

2. Next, create a template table

partman=# CREATE TABLE partman.employees_template (LIKE public.employees);
CREATE TABLE

3. Add constraints to the template table needed for all partitioned tables

partman=# ALTER TABLE partman.employees_template ADD PRIMARY KEY (ID);
ALTER TABLE

4. Create parent to create initial child partitioned tables

partman=# SELECT partman.create_parent('public.employees', p_control := 'joined',p_type := 'native',p_interval := 'yearly',p_template_table := 'partman.employees_template',p_premake := 2, p_start_partition := (CURRENT_TIMESTAMP + '1 hour'::interval)::text);
create_parent
---------------
(1 row)

Additionally, I have listed below the various options that can be used with the ‘create_parent’ function and their respective values:

Argument  Data Type Definition Values
p_parent_table text Parent Partitioned table ( Original table to have existed before creation & MUST be schema-qualified even if it is a public schema.) schema.tablename
p_control text You should partition the column, either integer or time-based. Column name
p_type text You can use the Partman type based on your performance improvement and flexibility requirements. native,partman
p_interval text Integer range or time interval for each partition. (The generic intervals of “yearly” to  “quarter-hour”) (yearly, quarterly, monthly, weekly, daily, hourly, half-hour, quarter-hour ) Any other interval that validates time.For ID based integer ranges.
p_template_table text Template table name, if not provided, will create one in the schema the extension is installed. schema.tablename
p_premake integer Additional partitions to stay ahead of the current partition. We create a default of 4 (resulting in the creation of 5 partitions, including the current one).
p_start_partition text This allows specifying the first partition of a set instead of it being automatically determined. It must be a valid timestamp (for time-based) or a positive integer (for id-based) value. CURRENT_TIMESTAMP /(CURRENT_TIMESTAMP + ‘1 hour’::interval)::text)
p_automatic_maintenance text The parameter determines whether maintenance is automatically managed when run_maintenance() is called without a table parameter or by a background worker process. DEFAULT ‘on’::text

Because we specified ‘2’ for premake in the create_parent function, you can observe that two partitions ahead have been created, as shown in the table definition below.

partman=# d+ employees
                 Partitioned table "public.employees"
Column |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id     | integer               |           |          |         | plain    |           
fname  | character varying(20) |           |          |         | extended |           
lname  | character varying(20) |           |          |         | extended |           
dob    | date                  |           | not null |         | plain    |           
joined | date                  |           | not null |         | plain    |           
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
           employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
           employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
          employees_default DEFAULT

To illustrate, we insert around 10,000 rows to see the data flush to its appropriate partitions if it exists or moves to the default partition.

partman=# INSERT INTO employees (id ,fname,lname,dob ,joined) VALUES ( generate_series(1,10000) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], (array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,
'2023-01-01'::date + trunc(random() * 366 * 3)::int);
INSERT 0 10000
partman=# SELECT  employees_p2023  , employees_p2024 ,employees_p2025,employees_default,employees_totalcnt from
( SELECT COUNT(*) FROM   employees_p2023 ) AS employees_p2023,( SELECT COUNT(*) FROM   employees_p2024 ) AS employees_p2024,
( SELECT COUNT(*) FROM   employees_p2025) AS employees_p2025 ,( SELECT COUNT(*) FROM   employees_default) AS employees_default ,
( SELECT COUNT(*) FROM   employees ) AS employees_totalcnt ;
employees_p2023 | employees_p2024 | employees_p2025 | employees_default |    employees_totalcnt
-----------------+-----------------+-----------------+-------------------+------------
(3294)          | (3397)          | (3293)          | (16)              | (10000)
(1 row)

By utilizing functions, we can seamlessly create child partitions and efficiently transfer data from the default table to these child tables.

partman=# CALL partman.run_maintenance_proc();
ERROR:  updated partition constraint for default partition "employees_default" would be violated by some row
CONTEXT: SQL statement "ALTER TABLE public.employees ATTACH PARTITION public.employees_p2026 FOR VALUES FROM ('2026-01-01 00:00:00+00') TO ('2027-01-01 00:00:00+00')"

As an illustration, the usage of partman.partition_data_proc shown below, which can move data batch-wise to partitioned tables.

partman=#  CALL partman.partition_data_proc ('public.employees');
NOTICE:  Batch: 1, Rows moved: 16
NOTICE:  Total rows moved: 16
NOTICE:  Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
partman=#  VACUUM ANALYZE employees;
VACUUM

As a result, we can see below the newly created table structure along with the new child partition.

partman=# d+ employees
                                        Partitioned table "public.employees"
Column |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id     | integer               |           |          |         | plain    |             |              
fname  | character varying(20) |           |          |         | extended |             |              
lname  | character varying(20) |           |          |         | extended |             |              
dob    | date                  |           | not null |         | plain    |             |              
joined | date                  |           | not null |         | plain    |             |              
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
           employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
           employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
           employees_p2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'),
          employees_default DEFAULT

Consequently, after using the function, you will notice that the data from the default table has been moved to the newly created partitioned table.

partman=#  SELECT  employees_p2023  , employees_p2024 ,employees_p2025, employees_p2026, employees_default,employees_totalcnt from
( SELECT COUNT(*) FROM   employees_p2023 ) AS employees_p2023,( SELECT COUNT(*) FROM   employees_p2024 ) AS employees_p2024,
( SELECT COUNT(*) FROM   employees_p2025) AS employees_p2025,( SELECT COUNT(*) FROM   employees_p2026) AS employees_p2026,
( SELECT COUNT(*) FROM   employees_default) AS employees_default , ( SELECT COUNT(*) FROM   employees ) AS employees_totalcnt ;
employees_p2023 | employees_p2024 | employees_p2025 | employees_p2026 | employees_default | employees_totalcnt 
-----------------+-----------------+-----------------+-----------------+-------------------+--------------------
(3294)          | (3397)          | (3293)          | (16)            | (0)               | (10000)
(1 row)

Creating the partitioned child tables ahead

Subsequently, we can modify the table partman.part_config settings to make changes by modifying the premake, which controls the partitions ahead.

partman=# update partman.part_config set premake = '4' where parent_table ='public.employees';
UPDATE 1
partman=# CALL partman.run_maintenance_proc();
partman=# d+ employees
                       Partitioned table "public.employees"
Column |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------
id     | integer               |           |          |         | plain    |                       
fname  | character varying(20) |           |          |         | extended |                        
lname  | character varying(20) |           |          |         | extended |                        
dob    | date                  |           | not null |         | plain    |                       
joined | date                  |           | not null |         | plain    |            
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
           employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
           employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
           employees_p2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'),
           employees_p2027 FOR VALUES FROM ('2027-01-01') TO ('2028-01-01'),
           employees_p2028 FOR VALUES FROM ('2028-01-01') TO ('2029-01-01'),
           employees_p2029 FOR VALUES FROM ('2029-01-01') TO ('2030-01-01'),
           employees_p2030 FOR VALUES FROM ('2030-01-01') TO ('2031-01-01'),
           employees_default DEFAULT

Automatic maintenance of partitions

You can plan to create partitions ahead of the INSERTS to prevent data from moving into the default_partition or the main table. You can achieve this by using functions to create newly partitioned tables and move the data.

For instance, we can use the function run_maintenance() to create new partitions in advance, and there are two ways to do it.

1. Using background worker
2. Cron Scheduler

1. Setting up background worker

To start with, we can set “shared_preload_libraries” for partition maintenance operations to run automatically. You can set the parameter either by using ALTER SYSTEM or by manually modifying the config file.

partman=# ALTER SYSTEM set shared_preload_libraries = 'pg_partman_bgw';
ALTER SYSTEM

Afterward, perform a restart of the cluster either using systemctl or pg_ctl.

ubuntu@ip-172-31-47-134:~$ sudo systemctl restart [email protected]
shared_preload_libraries
--------------------------
pg_partman_bgw
(1 row)

Configuration options background worker

1. pg_partman_bgw.interval: Number of seconds between calls to run_maintenance(). The default is 3600 (1 hour).
– For time-based partitions, intervals can be chosen based on the partition interval at least daily or twice daily.
– For serial-based partitions, you can choose the partition interval based on the data inserted on a daily basis.
2. pg_partman_bgw.dbname: Required. The database(s) that run_maintenance() will run on. If multiple, use a comma-separated list. If not set, BGW will do nothing.
3. pg_partman_bgw.role: The role that run_maintenance() will run as. The default is “postgres”. You are allowed to use only a single role name.
4. pg_partman_bgw.analyze: Optional argument, by default whenever a new child table is created, an analyze is run on the parent table of the partition set to ensure constraint exclusion works. This analyze can be skipped by setting this to false and help increase the speed of moving large amounts of data. If you set this to false, we strongly recommend that you perform a manual analyze of the partition upon completion to ensure the statistics are updated properly.

2. Cron scheduler

   00 01,12 * * * psql -c “SELECT pg_partman.run_maintenance()”

Advantages of pg_partman

  • Using pg_partman, you can define data retention to automatically move data into archived partitions or drop the partition.
  • Query Performance will improve when dealing with partitioned data specifically.
  • Vacuum efficiency will improve as data resides in smaller partitions, reducing the time required for table maintenance compared to dealing with large sets of data.
  • You can attach or detach partitions without downtime concurrently, and it will not impact ongoing database operations.

Limitations of pg_partman

  • pg_partman only supports partitioning of data types that are integer or date/timestamp.
  • Generally, when opting for native partitioning, the parent table should already be created as range partitioned but not list partitioned.
  • Unique constraints on partitioned tables must include all partitioning columns.

Conclusion

To summarize, we discussed creating and managing partitions effectively using PostgreSQL Partition Manager(pg_partman) along with the creation of the partition ahead of Inserting data using time-based. I will provide a more detailed usage of pg_partman in upcoming blog posts.

Please review the blogs below for reference:

Performing ETL Using Inheritance in PostgreSQL

PostgreSQL Partitioning Using Traditional Methods

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Share This Post!

Subscribe

Connect with
guest

Label

0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK