2

Parallel Commits for Transactions Using postgres_fdw on PostgreSQL 15

 2 years ago
source link: https://www.percona.com/blog/parallel-commits-for-transactions-using-postgres_fdw-on-postgresql-15/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Parallel Commits for Transactions Using postgres_fdw on PostgreSQL 15

Consuming data from a remote foreign database into our local database just like another local table is a great feature that adds an excellent level of flexibility to the design of a database solution. 

This option is present in a number of database products. Oracle (DBLINK), MySQL (FEDERATED Engine), and obviously, PostgreSQL (dblink, postgres_fdw) support it. PostgreSQL supports multiple foreign data wrappers. But in this blog, we will review a specific feature added for postgres_fdw in the new version 15.

How postgres_fdw works

Let’s say that you have a local and a remote PostgreSQL server. The second one has a table that your application using the local server needs to access. You might think of any of the following solutions:

  1. Configure your application to be able to connect both servers.
  2. Set up some background jobs to perform a copy every then and now.
  3. Configure logical replication between the two servers so the local table syncs up with the remote one.

Any of the above has its own advantages and disadvantages. For the first one, you might need to configure your network and grant access privileges to both DB servers. All these might add complexity and open the door for some security issues. The second option will add extra load on both servers. The data is not going to be up-to-date in real-time, so some reading inconsistencies might happen. The third one actually could be a good option. But if you are planning to MODIFY the synced data then you might face some collisions.  

Then the solution might be relaying in the Foreign Data Wrapper. From the official documentation the next steps are required to access data from a remote PostgreSQL server:

  1. Install the postgres_fdw extension using CREATE EXTENSION.
  2. Create a foreign server object, using CREATE SERVER, to represent each remote database you want to connect to. Specify connection information, except user and password, as options of the server object.
  3. Create a user mapping, using CREATE USER MAPPING, for each database user you want to allow to access each foreign server. Specify the remote user name and password to use as user and password options of the user mapping.
  4. Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA, for each remote table you want to access. The columns of the foreign table must match the referenced remote table. You can, however, use table and/or column names different from the remote tables, if you specify the correct remote names as options of the foreign table object.

You will end with something like this:

Foreign Data Wrapper PostgreSQL

So now, with the proper privileges, your application is able to access the data from the remote table just like another local table, you can SELECT or even execute DML statements on it.

What has changed on PG15?

The described functionality has been present for a while in the PostgreSQL versions, and the new enhancement that is going to be available in version 15 is the parallel transaction commit when using postgres_fdw. This is particularly useful when your solution design relies on having multiple remote servers, such as a distributed PostgreSQL database. 

From the Transaction Management documentation, we know that “during a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction”, in the case of our local transaction references multiple remote tables from different foreign servers, then a (sub)transaction is opened on each remote server.

Up to version 14, when there are multiple (sub)transactions tied to a local transaction and the transaction is committed locally, postgres_fdw commits each remote (sub)transaction one by one (sequentially). This behavior is still the default on version 15, but now we have the option to enable the parallel_commit option for each foreign server tied to a local transaction, and in that case, when the local transaction commits all the remote (sub)transactions will commit at once (parallel). 

Laboratory case

I built a laboratory to test the sequential and the parallel remote (sub)transactions commit and catch the differences. 

I decided to use the pgbench tool for the benchmark and the option to create partitions for the pgbench_accounts table, I adapted these partitions to be foreign tables, just as was described on a very interesting Percona blog about Sharding on PostgreSQL, worth having a look on this last one. Finally, I created a partitioned table with 70 partitions distributed over seven remote servers, each one tied to a postgres_fdw foreign server. 

The next diagram gives some insight.

postgres_fdw foreign server

Pgbench execution

To test the new feature I tried two different loads with the parallel_commit option disabled (default) and then enabled:

  • I ran the “normal” pgbench workload. It accesses one row per transaction, which means every local transaction opened only one remote (sub)transaction.
  • I ran a pgbench “custom” script to retrieve 100 rows per transaction. In this case, every local transaction opened multiple remote (sub)transactions.

Running the regular pgbench load against the PG15 server was not more than:

Shell
pgbench --progress-timestamp -c 16 -j 4 -T 3600 -P 2 \
  -h pg15 -U pgbench pgbench

After getting the results I enabled the parallel_commit option on each foreign server the next way:

Transact-SQL
ALTER SERVER remote_server<N> OPTIONS (ADD parallel_commit 'true');

And repeated the pgbench load. The results are as follows.

pgbench load

We see no big difference between the load with the parallel option set OFF or ON. This was kind of expected due to in the regular pgbench load every local transaction opened only one remote transaction.

For the second test scenario I used a custom pgbench script to go for 100 rows on each transaction, it was as follows:

Shell
$ cat pgbench_custom.sql
\set v1 random(1, 100000 * :scale)
\set v2 :v1 + 100
BEGIN;
  SELECT abalance FROM pgbench_accounts
  WHERE aid BETWEEN :v1 AND :v2;

Before running the load I disabled the parallel_commit option from the foreign servers executing the next for each one:

Transact-SQL
ALTER SERVER remote_server<N> OPTIONS (DROP parallel_commit);

And then I ran the custom load with NO parallel commit the next way:

Shell
pgbench --progress-timestamp -c 8 -j 4 -T 3600 -P 2 \
  -h pg15 -U pgbench -f pgbench_custom.sql pgbench

Then I enabled it on each foreign server with the command I used before and ran the custom load again. The next are the results.

foreign server

Now we see a clear improvement when using the parallel_commit option enabled. Consistently the number of TPS was higher during the test when the option was enabled showing how the performance can be improved.

In general, the TPS dropped down compared with the regular pgbench load. The details are out of the scope of this blog post, the goal here was to get an idea of how the performance might be improved using the new parallel_commit option in similar conditions. 

Conclusion

The capabilities a PostgreSQL database gets with postgres_fdw have been known for a while, and multiple system designs have taken advantage of them. 

The addition of the new parallel_commit option when using postgres_fdw in PostgreSQL 15 brings a very good opportunity to improve the performance of our databases if they are designed as distributed systems and our local transactions would be tied to multiple remote (sub)transactions. Every solution and system design requires its own analysis. This blog post only intends to show the new option, and how we can deliver a higher throughput under the same conditions just by enabling the new parallel_commit option. 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK