1

Providing Backups

 2 years ago
source link: https://www.percona.com/doc/kubernetes-operator-for-postgresql/backups.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Providing Backups

The Operator allows doing backups in two ways. Scheduled backups are configured in the deploy/cr.yaml file to be executed automatically in proper time. On-demand backups can be done manually at any moment.

The Operator uses the open source pgBackRest backup and restore utility. A special pgBackRest repository is created by the Operator along with creating a new PostgreSQL cluster to facilitate the usage of the pgBackRest features in it.

The Operator can store PostgreSQL backups on Amazon S3, any S3-compatible storage and Google Cloud Storage outside the Kubernetes cluster. Storing backups on Persistent Volume attached to the pgBackRest Pod is also possible. At PostgreSQL cluster creation time, you can specify a specific Storage Class for the pgBackRest repository. Additionally, you can also specify the type of the pgBackRest repository that can be used for backups:

  • local: Uses the storage that is provided by the Kubernetes cluster’s Storage Class that you select,
  • s3: Use Amazon S3 or an object storage system that uses the S3 protocol,
  • local,s3: Use both the storage that is provided by the Kubernetes cluster’s Storage Class that you select AND Amazon S3 (or equivalent object storage system that uses the S3 protocol).
  • gcs: Use Google Cloud Storage,
  • local,gcs: Use both the storage that is provided by the Kubernetes cluster’s Storage Class that you select AND Google Cloud Storage.

The pgBackRest repository consists of the following Kubernetes objects:

  • A Deployment,
  • A Secret that contains information that is specific to the PostgreSQL cluster that it is deployed with (e.g. SSH keys, AWS S3 keys, etc.),
  • A Pod with a number of supporting scripts,
  • A Service.

The PostgreSQL primary is automatically configured to use the pgbackrest archive-push and push the write-ahead log (WAL) archives to the correct repository.

The PostgreSQL Operator supports three types of pgBackRest backups:

  • Full (full): A full backup of all the contents of the PostgreSQL cluster,
  • Differential (diff): A backup of only the files that have changed since the last full backup,
  • Incremental (incr): A backup of only the files that have changed since the last full or differential backup. Incremental backup is the default choice.

The Operator also supports setting pgBackRest retention policies for backups. Backup retention can be controlled by the following pgBackRest options:

  • --repo1-retention-full the number of full backups to retain,
  • --repo1-retention-diff the number of differential backups to retain,
  • --repo1-retention-archive how many sets of write-ahead log archives to retain alongside the full and differential backups that are retained.

You can set both backups type and retention policy when Making on-demand backup.

Also you should first configure the backup storage in the deploy/cr.yaml configuration file to have backups enabled.

Configuring the S3-compatible backup storage

In order to use S3-compatible storage for backups you need to provide some S3-related information, such as proper S3 bucket name, endpoint, etc. This information can be passed to pgBackRest via the following deploy/cr.yaml options in the backup.storages subsection:

  • bucket specifies the AWS S3 bucket that should be utilized, for example my-postgresql-backups-example,
  • endpointUrl specifies the S3 endpoint that should be utilized, for example s3.amazonaws.com,
  • region specifies the AWS S3 region that should be utilized, for example us-east-1,
  • uriStyle specifies whether host or path style URIs should be utilized,
  • verifyTLS should be set to true to enable TLS verification or set to false to disable it,
  • type should be set to s3.

You also need to supply pgBackRest with base64-encoded AWS S3 key and AWS S3 key secret stored along with other sensitive information in Kubernetes Secrets (e.g. encoding needed data with the echo "string-to-encode" | base64 command). Edit the deploy/backup/cluster1-backrest-repo-config-secret.yaml configuration file: set there proper cluster name, AWS S3 key, and key secret:

apiVersion: v1
kind: Secret
metadata:
  name: <cluster-name>-backrest-repo-config
type: Opaque
data:
  aws-s3-key: <base64-encoded-AWS-S3-key>
  aws-s3-key-secret: <base64-encoded-AWS-S3-key-secret>

When done, create the secret as follows:

$ kubectl apply -f deploy/backup/cluster1-backrest-repo-config-secret.yaml

Finally, create or update the cluster:

$ kubectl apply -f deploy/cr.yaml

Use Google Cloud Storage for backups

You can configure Google Cloud Storage as an object store for backups similarly to S3 storage.

In order to use Google Cloud Storage (GCS) for backups you need to provide some GCS-related information, such as a proper GCS bucket name. This information can be passed to pgBackRest via the following options in the backup.storages subsection of the deploy/cr.yaml configuration file:

  • bucket should contain the proper bucket name,
  • type should be set to gcs.

The Operator will also need your service account key to access storage.

  1. Create your service account key following the official Google Cloud instructions.

  2. Export this key from your Google Cloud account.

    You can find your key in the Google Cloud console (select IAM & AdminService Accounts in the left menu panel, then click your account and open the KEYS tab):

    _images/gcs-service-account.svg

    Click the ADD KEY button, chose Create new key and chose JSON as a key type. These actions will result in downloading a file in JSON format with your new private key and related information.

  3. Now you should use a base64-encoded version of this file and to create the Kubernetes Secret. You can encode the file with the base64 <filename> command. When done, create the following yaml file with your cluster name and base64-encoded file contents:

    apiVersion: v1
    kind: Secret
    metadata:
      name: <cluster-name>-backrest-repo-config
    type: Opaque
    data:
      gcs-key: <base64-encoded-json-file-contents>
    

    When done, create the secret as follows:

    $ kubectl apply -f ./my-gcs-account-secret.yaml
    
  4. Finally, create or update the cluster:

    $ kubectl apply -f deploy/cr.yaml
    

Scheduling backups

Backups schedule is defined in the backup section of the deploy/cr.yaml file. This section contains following subsections:

  • storages subsection contains data needed to access the S3-compatible cloud to store backups.
  • schedule subsection allows to actually schedule backups (the schedule is specified in crontab format).

Here is an example of deploy/cr.yaml which uses Amazon S3 storage for backups:

...
backup:
  ...
  schedule:
   - name: "sat-night-backup"
     schedule: "0 0 * * 6"
     keep: 3
     type: full
     storage: s3
  ...

The schedule is specified in crontab format as explained in Custom Resource options.

Making on-demand backup

To make an on-demand backup, the user should use a backup configuration file. The example of the backup configuration file is deploy/backup/backup.yaml.

The following keys are most important in the parameters section of this file:

  • parameters.backrest-opts is the string with command line options which will be passed to pgBackRest, for example --type=full --repo1-retention-full=5,
  • parameters.pg-cluster is the name of the PostgreSQL cluster to back up, for example cluster1.

When the backup options are configured, execute the actual backup command:

$ kubectl apply -f deploy/backup/backup.yaml

List existing backups

To get list of all existing backups in the pgBackrest repo, use the following command:

$ kubectl exec <name-of-backrest-shared-repo-pod>  -it -- pgbackrest info

Restore the cluster from a previously saved backup

The Operator supports the ability to perform a full restore on a PostgreSQL cluster as well as a point-in-time-recovery. There are two types of ways to restore a cluster:

Restoring to a new PostgreSQL cluster allows you to take a backup and create a new PostgreSQL cluster that can run alongside an existing one. There are several scenarios where using this technique is helpful:

  • Creating a copy of a PostgreSQL cluster that can be used for other purposes. Another way of putting this is creating a clone.
  • Restore to a point-in-time and inspect the state of the data without affecting the current cluster.

To restore the previously saved backup the user should use a backup restore configuration file. The example of the backup configuration file is deploy/backup/restore.yaml.

The following keys are the most important in the parameters section of this file:

  • parameters.backrest-restore-from-cluster specifies the name of a PostgreSQL cluster which will be restored. This includes stopping the database and recreating a new primary with the restored data (for example, cluster1),
  • parameters.backrest-restore-opts specifies additional options for pgBackRest (for example, --type=time --target="2021-04-16 15:13:32" to perform a point-in-time-recovery),
  • parameters.backrest-storage-type the type of the pgBackRest repository, (for example, local).

The actual restoration process can be started as follows:

$ kubectl apply -f deploy/backup/restore.yaml

To create a new PostgreSQL cluster from either the active one, or a former cluster whose pgBackRest repository still exists, use the pgDataSource.restoreFrom option.

The following example will create a new cluster named cluster2 from an existing one named``cluster1``.

  1. First, create the cluster2-config-secrets.yaml configuration file with the following content:

    apiVersion: v1
    data:
      password: <base64-encoded-password-for-pguser->
      username: <base64-encoded-pguser-user-name>
    kind: Secret
    metadata:
      labels:
        pg-cluster: cluster2
        vendor: crunchydata
      name: cluster2-pguser-secret
    type: Opaque
    ---
    apiVersion: v1
    data:
      password: <base64-encoded-password-for-primaryuser>
      username: <base64-encoded-primaryuser-user-name>
    kind: Secret
    metadata:
      labels:
        pg-cluster: cluster2
        vendor: crunchydata
      name: cluster2-primaryuser-secret
    type: Opaque
    ---
    apiVersion: v1
    data:
      password: <base64-encoded-password-for-postgres-user>
      username: <base64-encoded-pguser-postgres-name>
    kind: Secret
    metadata:
      labels:
        pg-cluster: cluster2
        vendor: crunchydata
      name: cluster2-postgres-secret
    type: Opaque
    
  2. When done, create the secrets as follows:

    $ kubectl apply -f ./cluster2-config-secrets.yaml
    
  3. Edit the deploy/cr.yaml configuration file:

Create the cluster as follows:

$ kubectl apply -f deploy/cr.yaml

Delete a previously saved backup

The maximum amount of stored backups is controlled by the backup.schedule.keep option (only successful backups are counted). Older backups are automatically deleted, so that amount of stored backups do not exceed this number.

If you want to delete some backup manually, you need to delete both the pgtask object and the corresponding job itself. Deletion of the backup object can be done using the same YAML file which was used for the on-demand backup:

$ kubectl delete -f deploy/backup/backup.yaml

Deletion of the job which corresponds to the backup can be done using kubectl delete jobs command with the backup name:

$ kubectl delete jobs cluster1-backrest-full-backup

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK