3

PostgreSQL Job Scheduler: pg_timetable

 11 months ago
source link: https://www.percona.com/blog/postgresql-job-scheduler-pg_timetable/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

In database management, job scheduling has always played an important role. And for PostgreSQL, PgAgent and pg_cron are examples of job schedulers that are already popular. However, there is another job scheduler called pg_timetable, which is completely database-driven and provides a couple of advanced concepts. In this blog, we are going to highlight some of the main features, installation, and some use cases of pg_timetable.

Main features of pg_timetable

  • Fully database-driven configuration
  • Cron-style scheduling at the PostgreSQL server time zone
  • Built-in tasks such as sending emails, etc.
  • Full support for database-driven logging
  • Tasks can be arranged in chains
  • A chain can consist of built-in commands, SQL, and executables
  • Parameters can be passed to chains/tasks

Installation

There are currently two options on how you can install and run pg_timetable.

  1. Container installation. For this, please refer to this page.Note: The image mentioned on the above page is based on Alpine and not from Percona.
  2. Local installation, i.e., build from sources. Here, we will discuss this approach.

2.1. Download and install GO on your system.

2.2. Clone pg_timetable repo

$ git clone https://github.com/cybertec-postgresql/pg_timetable.git
 $ cd pg_timetable

2.3. Run pg_timetable

$ go run main.go --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword

2.4. Alternatively, build a binary and run it:

$ go build
 $ ./pg_timetable --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword

Demonstration of job scheduling with pg_timetable

As a use case here, it will be shown how to use pg_timetable as a scheduler to schedule a job, which will refresh the materialized view every day at 12 midnight.

1. Download pg_timetable executable (Follow step 2 mentioned above in the installation section).

2. Make sure the PostgreSQL server is up and running and has a role with CREATE privilege for a target database, e.g.:

postgres=# CREATE ROLE scheduler PASSWORD '***********';
postgres=# GRANT CREATE ON DATABASE postgres TO scheduler;
postgres=# CREATE TABLE t_demo (grp int, data numeric);
CREATE TABLE
postgres=# INSERT INTO t_demo SELECT 1, random()
FROM generate_series(1, 5000000);
INSERT 0 5000000
postgres=#CREATE MATERIALIZED VIEW mat_view AS
SELECT grp, avg(data), count(*)
FROM t_demo
GROUP BY 1;
SELECT 1
postgres=# ALTER MATERIALIZED VIEW mat_view OWNER TO scheduler;
postgres=# GRANT SELECT ON mat_view TO scheduler;
-bash-4.2$ psql
psql (12.16)
Type "help" for help.
postgres=# SELECT * FROM mat_view;
grp | avg | count
-----+--------------------------+---------
1 | 0.5001807958659610956005 | 5000000
(1 row)
postgres=# INSERT INTO t_demo SELECT 2, random()
postgres-# FROM generate_series(1, 5000000);
INSERT 0 5000000

3. Create a new job to refresh the materialized view each night at 12:00 Postgres server time zone.

postgres=# SELECT timetable.add_job('refresh-matview', '0 12 * * *', 'REFRESH MATERIALIZED VIEW public.mat_view');
add_job
---------
(1 row)

4. Run the pg_timetable

[centos@ip-172-31-32-10 pg_timetable]$ ./pg_timetable --dbname=postgres --clientname=worker001 --user=scheduler --password=********
2023-09-09 11:59:20.929 [INFO] [sid:697146069] Starting new session...
2023-09-09 11:59:20.941 [INFO] Database connection established
2023-09-10 12:00:00.961 [INFO] Accepting asynchronous chains execution requests...
2023-09-10 12:00:00.970 [INFO] [count:0] Retrieve scheduled chains to run @reboot
2023-09-10 12:00:00.991 [INFO] [count:3] Retrieve scheduled chains to run
2023-09-10 12:00:00.994 [INFO] [count:0] Retrieve interval chains to run
2023-09-10 12:00:00.019 [INFO] [chain:1] Starting chain
2023-09-10 12:00:00.722 [INFO] [chain:1] [task:1] [txid:2613] Starting task
2023-09-10 12:00:00.074 [INFO] [chain:1] [task:1] [txid:2613] Starting task
2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Closing remote session
2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Task executed successfully
2023-09-10 12:00:00.185 [INFO] [chain:1] [txid:2613] Chain executed successfully

During the first start of pg_timetable, the necessary schema timetable gets created. For reference, below is the catalog structure.

postgres=# dn
List of schemas
Name | Owner
-----------+-----------
public | postgres
timetable | scheduler
(2 rows)
postgres=# set search_path to timetable ;
postgres=# dt
List of relations
Schema | Name | Type | Owner
-----------+----------------+-------+-----------
timetable | active_chain | table | scheduler
timetable | active_session | table | scheduler
timetable | chain | table | scheduler
timetable | execution_log | table | scheduler
timetable | log | table | scheduler
timetable | migration | table | scheduler
timetable | parameter | table | scheduler
timetable | task | table | scheduler
(8 rows)

5. From database logs, it can be observed that MATERIALIZED VIEW gets refreshed as per the schedule.

Output from DB logs:

2023-09-10 12:00:00 UTC [14334] LOG:  statement: REFRESH MATERIALIZED VIEW public.mat_view

From psql prompt:

-bash-4.2$ psql
psql (12.16)
Type "help" for help.
postgres=# SELECT * FROM mat_view;
grp | avg | count
-----+---------------------------+---------
1 | 0.5001807958659610956005 | 5000000
2 | 0.50000009110202547559215 | 5000000
(2 rows)

Below is the output from the pg_timetable catalog tables.

postgres=# select * from active_session ;
client_pid | server_pid | client_name | started_at
------------+------------+-------------+------------------------------
697146069 | 20137 | worker001 | 2023-09-10 11:59:205672+00
(1 row)
postgres=# select * from chain;
-[ RECORD 1 ]-------+--------------------
chain_id | 1
chain_name | refresh-matview
run_at | 0 12 * * *
max_instances |
timeout | 0
live | t
self_destruct | f
exclusive_execution | f
client_name |
on_error |
postgres=# select * from execution_log where chain_id=1;
-[ RECORD 01 ]-------------------------------------------------------------------
chain_id | 1
task_id | 1
txid | 2613
last_run | 2023-09-10 12:00:00.137404+00
finished | 2023-09-10 12:00:00.586543+00
pid | 697146069
returncode | 0
kind | SQL
command | REFRESH MATERIALIZED VIEW public.mat_view
output | REFRESH MATERIALIZED VIEW
client_name | worker001

In conclusion, we can say that pg_timetable is open source and can be used freely by everyone. The main advantages include that pg_timetable is an independent process written in GO, which connects to PostgreSQL just like any other client program. So, if the scheduler crashes, it will not harm your server. Pg_timetable provides a variety of built-in tasks that help you to flexibly combine those operations in an easy way. Further, pg_timetable has been implemented in GO and, therefore, comes as ONE executable that can be started directly. So, there is no need to worry about libraries or dependencies during the installation.

Please refer to the links below to learn more about pg_timetable.

https://github.com/cybertec-postgresql/pg_timetable

https://www.cybertec-postgresql.com/en/products/pg_timetable/

https://pg-timetable.readthedocs.io/en/master/README.html

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Share This Post!

Subscribe

Connect with
guest

Label

0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK