6

Working With PostgreSQL Dump Manifests

 2 years ago
source link: https://www.percona.com/blog/working-with-postgresql-dump-manifests/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Working With PostgreSQL Dump Manifests

After working with PostgreSQL for so many years it’s sometimes surprising to see some of the utilities, to a certain degree, have capabilities that remain relatively unknown to more than a few people. So let me introduce you to manifests.

Both pg_dump and pg_restore have switches enabling us to zone in on specific objects of interest such as functions, tables, data, etc., as we create and restore a logical dump. But there comes a time when expressing this kind of granularity as arguments can become overly complex when we look at dozens, if not hundreds, of objects.

Enter manifests

PostgreSQL dumps store not only the objects and data of a database but their descriptive meta-data too. When generated as a text file, a PostgreSQL manifest details each object on a single line. These lines can then be edited by either removing or reordering them. The edited file can then be used with pg_restore to restore the entities in the prescribed order.

Here’s how it works; the first step is to generate a standard dump using the custom compression argument -Fc

Shell
# generate the logical dump
pg_dump -C -Fc mydatabase > mydatabase.db

Once the dump has been created the manifest is generated using pg_restore and the -l switch, which goes to standard output:

Shell
# generate a manifest, the meta data, sic description, of the logical dump
pg_restore -l mydatabase.db > mydatabase_manifest.ini

And like a recipe, the manifest is then used to read and restore the dump following its directions:

Shell
# restore the dump using the manifest
pg_restore -L mydatabase_manifest.ini -d postgres mydatabase.db

Of course, in between generating and reading the manifest, one takes the opportunity of editing it too:

So let’s use a more realistic i.e. pgbench. The first step initializes the benchmarking database, notice the use of foreign keys:

Shell
dropdb --if-exists pgbench
createdb pgbench
pgbench -i --foreign-keys -s 50 pgbench

Let’s complicate the landscape by adding a number of views including a materialized view:

Shell
psql pgbench <<_eof_
    create or replace view v1_pgbench_accounts as select * from pgbench_accounts;
    create or replace view v2_pgbench_accounts as select * from pgbench_accounts;
    create or replace view v3_pgbench_accounts as select * from pgbench_accounts;
    create materialized view m_pgbench_accounts as select * from v1_pgbench_accounts order by random() limit 1e3;
_eof_

And now let’s create the database dump. Remember, manifests can only be generated when the dump has been created using the custom format switch -Fc.

Shell
pg_dump -Fc -C pgbench > pgbench.db

Now, let’s generate the manifest. I’ve opted to use the ini extension to the file name as my code editor recognizes the semicolon as a comment line and colors it accordingly:

Shell
pg_restore -C -l pgbench.db > pgbench_manifest.ini

As per the normal process of generating and restoring a logical dump, one sees from the output listed below the command creating the database, that’s from the -C switch, followed by the views, table definitions, data population, and the constraints, and finishing with the data population of the materialized view:

; Archive created at 2022-08-24 10:21:46 PDT
;     dbname: pgbench
;     TOC Entries: 25
;     Compression: -1
;     Dump Version: 1.14-0
;     Format: CUSTOM
;     Integer: 4 bytes
;     Offset: 8 bytes
;     Dumped from database version: 14.0
;     Dumped by pg_dump version: 14.0
; Selected TOC Entries:
3361; 1262 25332 DATABASE - pgbench postgres
211; 1259 25339 TABLE public pgbench_accounts postgres
213; 1259 25380 VIEW public v1_pgbench_accounts postgres
216; 1259 25392 MATERIALIZED VIEW public m_pgbench_accounts postgres
212; 1259 25342 TABLE public pgbench_branches postgres
209; 1259 25333 TABLE public pgbench_history postgres
210; 1259 25336 TABLE public pgbench_tellers postgres
214; 1259 25384 VIEW public v2_pgbench_accounts postgres
215; 1259 25388 VIEW public v3_pgbench_accounts postgres
3353; 0 25339 TABLE DATA public pgbench_accounts postgres
3354; 0 25342 TABLE DATA public pgbench_branches postgres
3351; 0 25333 TABLE DATA public pgbench_history postgres
3352; 0 25336 TABLE DATA public pgbench_tellers postgres
3200; 2606 25354 CONSTRAINT public pgbench_accounts pgbench_accounts_pkey postgres
3202; 2606 25350 CONSTRAINT public pgbench_branches pgbench_branches_pkey postgres
3198; 2606 25352 CONSTRAINT public pgbench_tellers pgbench_tellers_pkey postgres
3207; 2606 25360 FK CONSTRAINT public pgbench_accounts pgbench_accounts_bid_fkey postgres
3205; 2606 25375 FK CONSTRAINT public pgbench_history pgbench_history_aid_fkey postgres
3203; 2606 25365 FK CONSTRAINT public pgbench_history pgbench_history_bid_fkey postgres
3204; 2606 25370 FK CONSTRAINT public pgbench_history pgbench_history_tid_fkey postgres
3206; 2606 25355 FK CONSTRAINT public pgbench_tellers pgbench_tellers_bid_fkey postgres
3355; 0 25392 MATERIALIZED VIEW DATA public m_pgbench_accounts postgres

Once generated, the manifest can now be tailored to our requirements which in this case is to restore a subset of the database dump as follows:

  1. Create database pgbench_restore
  2. Restore only view v1_pgbench_accounts
  3. Restore and populate table pgbench_accounts
  4. Restore and populate materialized view m_pgbench_accounts

You will note that in order to restore table pgbench_accounts it will be necessary to not add all the constraints, i.e. remove the FK constraints.

After some editing here’s what the resulting manifest looks like:

211; 1259 25339 TABLE public pgbench_accounts postgres
213; 1259 25380 VIEW public v1_pgbench_accounts postgres
216; 1259 25392 MATERIALIZED VIEW public m_pgbench_accounts postgres
3353; 0 25339 TABLE DATA public pgbench_accounts postgres
3200; 2606 25354 CONSTRAINT public pgbench_accounts pgbench_accounts_pkey postgres
3355; 0 25392 MATERIALIZED VIEW DATA public m_pgbench_accounts postgres

With a little creativity it’s not that big of a leap to dynamically generate the appropriate manifest:

Shell
GREP="TABLE public pgbench_accounts|VIEW public v1_pgbench_accounts|MATERIALIZED|DATA public pgbench_accounts|pgbench_accounts_pkey|DATA public m_pgbench_accounts"
pg_restore -C -l pgbench.db | grep -E "$GREP" > pgbench_manifest.ini

And now we proceed with the final restoration steps:

Shell
dropdb --if-exists pgbench_restore
createdb pgbench_restore
# take a look at the generated SQL before uploading the logical dump
pg_restore -1 -L pgbench_manifest.ini -f - | less -S
# the logical dump can now be uploaded
pg_restore -1 -L pgbench_manifest.ini -d pgbench_restore pgbench.db

Taking a look at the resulting database confirms success:

Shell
psql pgbench_restore -c '\d'
Shell
                                                 List of relations
Schema |        Name         |       Type        |  Owner   | Persistence | Access method |  Size
--------+---------------------+-------------------+----------+-------------+---------------+---------
public | m_pgbench_accounts  | materialized view | postgres | permanent   | heap          | 144 kB  |
public | pgbench_accounts    | table             | postgres | permanent   | heap          | 640 MB  |
public | v1_pgbench_accounts | view              | postgres | permanent   |               | 0 bytes |

And that’s all there is to it! 

One final caveat: one edits the manifest by deleting whole lines. It’s not possible to edit the lines themselves, so don’t change anything, nor can new lines be arbitrarily added to the manifest. Remember, manifests are like pointers in that they can only reference what already exists in the database dump.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK