15

Refreshing Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

 3 years ago
source link: https://www.percona.com/blog/2021/05/19/refreshing-test-dev-environments-with-prod-data-using-percona-backup-for-mongodb/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Test/Dev Environments With Prod Data Using Percona Backup for MongoDBThis is a very straightforward article written with the intention to show you how easy it is to refresh your Test/Dev environments with PROD data, using Percona Backup for MongoDB (PBM). This article will cover all the steps from the PBM configuration until the restore, assuming that the PBM agents are all up and running on all the replica set members of either PROD and Dev/Test servers.

Taking the Backup on PROD

This step is quite simple and it demands no more than two commands:

1. Configuring the Backup

Shell
$ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm config --file /etc/pbm/pbm-s3.yaml
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpPROD
    credentials:
      access-key-id: '***'
      secret-access-key: '***'
Backup list resync from the store has started

Important note on two things: I will address my backups to an S3 bucket and I am defining a prefix. When defining a prefix in the PBM storage configuration, a subdirectory will be automatically created and the backup files will be stored on that subdirectory instead of the root of the S3 bucket.

2. Taking the Backup

Having the PBM properly configured, it is time to take the backup. (You can skip this step if you already have PBM backups to use, of course.)

Shell
$ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm backup
Starting backup '2021-05-08T08:34:47Z'...................
Backup '2021-05-08T08:34:47Z' to remote store 's3://rafapbmtest/bpPROD' has started

And if we hit the PBM status command, we will see the snapshot running and when it is complete, the PBM status will show it as completed like below:

Shell
$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Configuring the PBM Space on a DEV/TEST Environment

All right, now my PROD has a proper backup routine configured. I will move one step forward and configure my PBM space but this time in a Dev/Test environment – named here as DEV.

Shell
$ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:50001/?replSetName=rbprepDEV?authSource=admin'
$ pbm config --file /etc/pbm/pbm-s3.yaml 
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpDEV
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

The backup list resync from the store has started.

Note that the S3 bucket is exactly the same where PROD is storing the backups but with a different prefix. If I hit a status command, I will see it is configured but no snapshots available yet:

Shell
$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
(none)

Lastly, note that the replica set name is exactly the same as PROD. If this was a sharded cluster, rather than a non-sharded replicaset, all the replica set names have to match in the target cluster. PBM is guided by the replica set name and if my DEV env had a different one, it would not be possible to load backup metadata from PROD to DEV

Transfering the Desired Backup Files

The next step will be transferring the backup files from the PROD prefix to the target prefix. I will use the AWS CLI to achieve that, but there is one important thing to keep in mind in advance: determining which files are referent to a certain backup set (snapshot). Let’s go back to the PBM status output taken in PROD previously:

Shell
$ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

The PBM snapshots are named with the timestamp from when the backup started. If we check at the S3 prefix where it is stored, we will see that the file’s names contain that timestamp in its name composition.

Shell
$ aws s3 ls s3://rafapbmtest/bpPROD/
2021-05-08 10:26:11          5 .pbm.init
2021-05-08 10:35:14       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:35:10      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:35:13        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

So, it will be easy now to know which file I have to copy.

Shell
$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z.pbm.json
$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.dump.s2
$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.oplog.s2

Checking the DEV prefix:

Shell
$ aws s3 ls s3://rafapbmtest/bpDEV/
2021-05-08 10:43:59          5 .pbm.init
2021-05-08 10:52:02       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:52:13      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:52:24        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

The files are already there and PBM has already automatically loaded their metadata into the DEV PBM collections:

Shell
$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Finally – Restoring It

Believing it or not, now comes the easiest part: the restore. It is only one command and nothing else:

Shell
$ pbm restore '2021-05-08T08:34:47Z'
....Restore of the snapshot from '2021-05-08T08:34:47Z' has started

Refreshing Dev/Test environments with PROD data is a very common and required task in corporations worldwide. I hope this article helps to clarify the practical questions regarding using PBM for it!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK