2

Using Percona Backup for MongoDB in Replica Set and Sharding Environment: Part O...

 6 months ago
source link: https://www.percona.com/blog/using-percona-backup-for-mongodb-in-replica-set-and-sharding-environment-part-one/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Using Percona Backup for MongoDB in Replica Set and Sharding Environment: Part One

February 21, 2024

Anil Joshi

Backups are crucial for every database system, and having a reliable, fast, and hot backup is the demand for next-generation database systems. Percona Backup for MongoDB (PBM) is a backup management tool that enhances the existing backup capability of MongoDB by providing various layers of backups such as physical, logical, incremental, PITR, etc.

In this blog post, we are going to see how we can use this backup tool in MongoDB topologies such as replica set and sharding. For the purpose of this demo, I have used a single instance/machine and the mlaunch tool to build the required topologies.

PBM setup/usage for replica set

1) So, let’s assume we already have set up a three-node replica set. 

localhost:27017
localhost:27018
localhost:27019

2) Next, we will download the PBM tool from the official repo.

shell> sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
shell> sudo percona-release enable pbm release
shell> sudo yum install percona-backup-mongodb

We can verify the installation and the PBM version as below.

shell> pbm version
Version:   2.3.1
Platform:  linux/amd64
GitCommit: 8c4265cfb2d9a7581b782a829246d8fcb6c7d655
GitBranch: release-2.3.1
BuildTime: 2023-11-29_13:30_UTC
GoVersion: go1.19

3) Here, we will enable/configure authentication in MongoDB for PBM usage. The below commands need to be executed in the Primary node(localhost:27017).

replset:PRIMARY> use admin;
switched to db admin
replset:PRIMARY>
replset:PRIMARY> db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
...       "privileges": [
...          { "resource": { "anyResource": true },
...            "actions": [ "anyAction" ]
...       ],
...       "roles": []
...    });
"role" : "pbmAnyAction",
"privileges" : [
"resource" : {
"anyResource" : true
"actions" : [
"anyAction"
"roles" : [ ]
replset:PRIMARY> 
replset:PRIMARY> db.getSiblingDB("admin").createUser({user: "pbmuser",
...        "pwd": "pbmuser",
...        "roles" : [
...           { "db" : "admin", "role" : "readWrite", "collection": "" },
...           { "db" : "admin", "role" : "backup" },
...           { "db" : "admin", "role" : "clusterMonitor" },
...           { "db" : "admin", "role" : "restore" },
...           { "db" : "admin", "role" : "pbmAnyAction" }
...     });
Successfully added user: {
"user" : "pbmuser",
"roles" : [
"db" : "admin",
"role" : "readWrite",
"collection" : ""
"db" : "admin",
"role" : "backup"
"db" : "admin",
"role" : "clusterMonitor"
"db" : "admin",
"role" : "restore"
"db" : "admin",
"role" : "pbmAnyAction"

Note: Some roles are built-in however we created one additional role (pbmAnyAction). The commands “db.getUsers()” & “db.getRoles()”  can be used to verify the creations.

4) Now, we will configure the MongoDB connection URL for the pbm-agent process. We need to add the entries in the file [“/etc/sysconfig/pbm-agent”] for the local Mongo node.

PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27017/?authSource=admin"
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27018/?authSource=admin"
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27019/?authSource=admin"

Note: A pbm-agent process connects to its localhost mongod node with a standalone type of connection. Do not set up the agent to connect to the replica set URI.

Further, we can persist those settings by defining in [“~/.bashrc”] profile of the user. As these settings could affect the PBM client, so it should connect to the replica set instead of the local node.

export PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27017/?authSource=admin&replSetName=replset"
export PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27018/?authSource=admin&replSetName=replset"
export PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27019/?authSource=admin&replSetName=replset"

Let’s apply this to the current session as well:

source ~/.bashrc

5) Next, we can define the PBM configuration and storage-related details in the file [“/etc/pbm_config.yaml“]. Here we are doing the backup on the local system; however, we can define some cloud storage such as ( AWS S3 or Google Cloud storage).

storage:
  type: filesystem
  filesystem:
    path: /home/backups

Note: Please ensure to mount the same directory at the same local path[“/home/backups”] on all servers.

Then we can apply the changes below.

shell> pbm config --file /etc/pbm_config.yaml

Output

pitr:
  enabled: false
  oplogSpanMin: 0
  compression: s2
storage:
  type: filesystem
  filesystem:
    path: /home/backups
backup:
  compression: s2

6) Now, we will run the PBM agent process separately for all the Mongo nodes.

shell> nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27017/" > /tmp/pbm-agent.27017.log 2>&1 &
shell> nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27018/" > /tmp/pbm-agent.27018.log 2>&1 &
shell> nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27019/" > /tmp/pbm-agent.27019.log 2>&1 &

Note: Since we are running the entire setup on a single server we have used the above command line option to run the PBM agent service. However, in real world or production, we should use the proper service [“systemctl start pbm-agent”] to manage the agents.

6)  Finally, we can verify if all our configurations look good and if the pbm-agent connected fine. The below output looks healthy.

shell> pbm status

Output

Cluster:
========
replset:
  - replset/localhost:27017 [P]: pbm-agent v2.3.1 OK
  - replset/localhost:27018 [S]: pbm-agent v2.3.1 OK
  - replset/localhost:27019 [S]: pbm-agent v2.3.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
FS  /home/backups
  (none)

7) Here, we are ready to take our first backup by simply executing the below single command via PBM CLI.

shell> pbm backup
Starting backup '2024-02-10T04:50:18Z'....Backup '2024-02-10T04:50:18Z' to remote store '/home/backups' has started

Let’s verify if the backup was completed successfully.

shell> pbm list
Backup snapshots:
  2024-02-10T04:50:18Z <logical> [restore_to_time: 2024-02-10T04:50:22Z]
shell> ls -lh /home/backups/
total 20K
drwxr-xr-x. 3 vagrant vagrant   21 Feb 10 04:50 2024-02-10T04:50:18Z
-rw-r--r--. 1 vagrant vagrant 1.7K Feb 10 04:50 2024-02-10T04:50:18Z.pbm.json
drwxr-xr-x. 3 vagrant vagrant   21 Feb 10 04:52 2024-02-10T04:52:35Z
-rw-r--r--. 1 vagrant vagrant  16K Feb 10 04:52 2024-02-10T04:52:35Z.pbm.json

We can see the above folder/files generated after the backup. By default, PBM performs a logical backup unless we specify the “–type” of backup.

E.g.,

shell> pbm backup --type=physical
shell> pbm list
Backup snapshots:
 2024-02-10T04:50:18Z <logical> [restore_to_time: 2024-02-10T04:50:22Z]
 2024-02-10T04:52:35Z <physical> [restore_to_time: 2024-02-10T04:52:37Z]

8) Now if we want to restore any of the backups out of that list, we can simply execute the below command.

shell> pbm restore 2024-02-10T04:50:18Z
Starting restore 2024-02-10T04:56:13.099066243Z from '2024-02-10T04:50:18Z'...Restore of the snapshot from '2024-02-10T04:50:18Z' has started

Again we can validate if the restore is done successfully or not with the help of the below command.

shell> pbm logs --event=restore

Output

2024-02-10T04:56:19Z I [replset/localhost:27017] [restore/2024-02-10T04:56:13.099066243Z] restoring indexes for admin.system.roles: role_1_db_1
2024-02-10T04:56:19Z I [replset/localhost:27017] [restore/2024-02-10T04:56:13.099066243Z] restoring indexes for admin.pbmOpLog: opid_1_replset_1
2024-02-10T04:56:19Z I [replset/localhost:27017] [restore/2024-02-10T04:56:13.099066243Z] restoring indexes for admin.pbmPITRChunks: rs_1_start_ts_1_end_ts_1, start_ts_1_end_ts_1
2024-02-10T04:56:19Z I [replset/localhost:27017] [restore/2024-02-10T04:56:13.099066243Z] restoring indexes for admin.pbmBackups: name_1, start_ts_1_status_1
2024-02-10T04:56:20Z I [replset/localhost:27017] [restore/2024-02-10T04:56:13.099066243Z] recovery successfully finished

Next, we will see how we can perform similar activities in the sharded/distributed environment.

PBM setup/usage for sharding

1) So, here we have a sharding-based setup with nodes below.

localhost:27017 mongos
localhost:27022 config (configRepl)
localhost:27018 & localhost:27018  shardA (shard01)
localhost:27020 & localhost:27021  shardB (shard02)

2) Next, we will enable the authentication and create a user for PBM in each replica set (primary) instance, including the config servers. So, the user will be created in [config, shardA, and shardB] primary nodes only.

configRepl:PRIMARY> db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
...       "privileges": [
...          { "resource": { "anyResource": true },
...            "actions": [ "anyAction" ]
...       ],
...       "roles": []
...    });
"role" : "pbmAnyAction",
"privileges" : [
"resource" : {
"anyResource" : true
"actions" : [
"anyAction"
"roles" : [ ]
configRepl:PRIMARY>
configRepl:PRIMARY>
configRepl:PRIMARY> db.getSiblingDB("admin").createUser({user: "pbmuser",
...        "pwd": "pbmuser",
...        "roles" : [
...           { "db" : "admin", "role" : "readWrite", "collection": "" },
...           { "db" : "admin", "role" : "backup" },
...           { "db" : "admin", "role" : "clusterMonitor" },
...           { "db" : "admin", "role" : "restore" },
...           { "db" : "admin", "role" : "pbmAnyAction" }
...     });
Successfully added user: {
"user" : "pbmuser",
"roles" : [
"db" : "admin",
"role" : "readWrite",
"collection" : ""
"db" : "admin",
"role" : "backup"
"db" : "admin",
"role" : "clusterMonitor"
"db" : "admin",
"role" : "restore"
"db" : "admin",
"role" : "pbmAnyAction"
shard01:PRIMARY> db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
...       "privileges": [
...          { "resource": { "anyResource": true },
...            "actions": [ "anyAction" ]
...       ],
...       "roles": []
...    });
"role" : "pbmAnyAction",
"privileges" : [
"resource" : {
"anyResource" : true
"actions" : [
"anyAction"
"roles" : [ ]
shard01:PRIMARY> 
shard01:PRIMARY>
shard01:PRIMARY>
shard01:PRIMARY> db.getSiblingDB("admin").createUser({user: "pbmuser",
...        "pwd": "pbmuser",
...        "roles" : [
...           { "db" : "admin", "role" : "readWrite", "collection": "" },
...           { "db" : "admin", "role" : "backup" },
...           { "db" : "admin", "role" : "clusterMonitor" },
...           { "db" : "admin", "role" : "restore" },
...           { "db" : "admin", "role" : "pbmAnyAction" }
...     });
Successfully added user: {
"user" : "pbmuser",
"roles" : [
"db" : "admin",
"role" : "readWrite",
"collection" : ""
"db" : "admin",
"role" : "backup"
"db" : "admin",
"role" : "clusterMonitor"
"db" : "admin",
"role" : "restore"
"db" : "admin",
"role" : "pbmAnyAction"
shard02:PRIMARY> db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
...       "privileges": [
...          { "resource": { "anyResource": true },
...            "actions": [ "anyAction" ]
...       ],
...       "roles": []
...    });
"role" : "pbmAnyAction",
"privileges" : [
"resource" : {
"anyResource" : true
"actions" : [
"anyAction"
"roles" : [ ]
shard02:PRIMARY>
shard02:PRIMARY>
shard02:PRIMARY>
shard02:PRIMARY> db.getSiblingDB("admin").createUser({user: "pbmuser",
...        "pwd": "pbmuser",
...        "roles" : [
...           { "db" : "admin", "role" : "readWrite", "collection": "" },
...           { "db" : "admin", "role" : "backup" },
...           { "db" : "admin", "role" : "clusterMonitor" },
...           { "db" : "admin", "role" : "restore" },
...           { "db" : "admin", "role" : "pbmAnyAction" }
...     });
Successfully added user: {
"user" : "pbmuser",
"roles" : [
"db" : "admin",
"role" : "readWrite",
"collection" : ""
"db" : "admin",
"role" : "backup"
"db" : "admin",
"role" : "clusterMonitor"
"db" : "admin",
"role" : "restore"
"db" : "admin",
"role" : "pbmAnyAction"

3) Here, we will configure the MongoDB connection URL for the pbm-agent process. We need to add the entries in the file [“/etc/sysconfig/pbm-agent“] for the local Mongo node, including the config node.

#config
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27022/?authSource=admin"
#shardA
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27018/?authSource=admin"
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27019/?authSource=admin"
#shardB
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27020/?authSource=admin"
PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27021/?authSource=admin"

Further, we can persist those settings by defining in [“~/.bashrc“] profile of the user. In case of shared deployments, the PBM client should connect to the config server replica set.

#config
export PBM_MONGODB_URI="mongodb://pbmuser:pbmuser@localhost:27022/?authSource=admin&replicaSet=configRepl"
source ~/.bashrc

4) Let’s define the PBM configuration and storage-related details in the file [“/etc/pbm_config.yaml“]. So here we are performing the backup in the local storage.

storage:
  type: filesystem
  filesystem:
    path: /home/backups

Then we can apply the changes below.

shell> pbm config --file /etc/pbm_config.yaml
pitr:
  enabled: false
  oplogSpanMin: 0
  compression: s2
storage:
  type: filesystem
  filesystem:
    path: /home/backups
backup:
  compression: s2

5)  Now, we will run the PBM agent process separately for all the Mongo nodes (data and config).

#config
nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27022/" > /tmp/pbm-agent.27022.log 2>&1 &
#shardA
nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27018/" > /tmp/pbm-agent.27018.log 2>&1 &
nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27019/" > /tmp/pbm-agent.27019.log 2>&1 &
#shardB
nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27020/" > /tmp/pbm-agent.27020.log 2>&1 &
nohup pbm-agent --mongodb-uri "mongodb://pbmuser:pbmuser@localhost:27021/" > /tmp/pbm-agent.27021.log 2>&1 &

Note: Since we are running the entire setup on a single server we have used the above command line option to run the PBM agent service. However in the real world or production we should use the proper service [“systemctl start pbm-agent“] to manage the agents.

6)  Finally we can verify if all our configurations look good and the pbm-agent connected fine.

shell> pbm status
Cluster:
========
configRepl:
  - configRepl/localhost:27022 [P]: pbm-agent v2.3.1 OK
shard02:
  - shard02/localhost:27020 [P]: pbm-agent v2.3.1 OK
  - shard02/localhost:27021 [S]: pbm-agent v2.3.1 OK
shard01:
  - shard01/localhost:27018 [P]: pbm-agent v2.3.1 OK
  - shard01/localhost:27019 [S]: pbm-agent v2.3.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
FS  /home/backups
  (none

7) Next, we can take the backup, but before that, let’s fill our shard environment with some data to verify the data distribution post the restoration.

mongos> sh.enableSharding("test")
mongos> sh.shardCollection("test.users", { "user_id": "hashed" } )
mongos> for (var i = 1; i <= 30000; i++) db.users.insert( { user_id : "user"+i,created_at :new Date() } )
mongos> db.users.getShardDistribution()
Shard shard01 at shard01/localhost:27018,localhost:27019
data : 946KiB docs : 15001 chunks : 2
estimated data per chunk : 473KiB
estimated docs per chunk : 7500
Shard shard02 at shard02/localhost:27020,localhost:27021
data : 946KiB docs : 14999 chunks : 2
estimated data per chunk : 473KiB
estimated docs per chunk : 7499
Totals
data : 1.84MiB docs : 30000 chunks : 4
Shard shard01 contains 50% data, 50% docs in cluster, avg obj size on shard : 64B
Shard shard02 contains 49.99% data, 49.99% docs in cluster, avg obj size on shard : 64B

So, we now have some data on both shard01 and shard02.

8) Finally, let’s do some backup.

shell> pbm backup
Starting backup '2024-02-10T06:08:15Z'....Backup '2024-02-10T06:08:15Z' to remote store '/home/backups' has started
shell> pbm list
Backup snapshots:
  2024-02-10T06:08:15Z <logical> [restore_to_time: 2024-02-10T06:08:20Z]

9) Again, if we want to restore the backup, we can just execute the simple command (pbm restore …). Let’s first clean the existing data so that we can later verify the fresh restore.

shell> mongo --port 27017
mongos> use test
switched to db test
mongos> db.dropDatabase()
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1707545529, 41),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
"operationTime" : Timestamp(1707545529, 39)
mongos> show dbs
admin   0.001GB
config  0.004GB

Let’s restore the backup now.

shell> pbm restore 2024-02-10T06:08:15Z
Starting restore 2024-02-10T06:14:09.284538667Z from '2024-02-10T06:08:15Z'...Restore of the snapshot from '2024-02-10T06:08:15Z' has started

So, if we again connect to the router/mongos node, we can see the database is successfully restored now.

mongos> show dbs
admin   0.001GB
config  0.003GB
test    0.002GB
mongos> use test
switched to db test
mongos> show collections
users
mongos> db.users.count()
30000

Monitoring/investigating PBM

There are a few ways by which we can investigate/monitor the PBM activity or logs for the backup/restore process.

shell> pbm logs ### show all log details.
shell> pbm logs --event=backup ### show log details specific to backup
shell> pbm logs --event=restore ### show log details specific to restore
shell> journalctl -u pbm-agent.service ### to check the agent related events
shell> pbm describe-backup backup_name ### to check particular backup related details.

Physical vs. logical backup

Physical backup is the copying of physical/disk files from Percona Server for MongoDB (PSMDB). While performing restores, the pbm-agents shut down the mongod nodes, cleaned up the data directory, and copied the physical files from the storage.

Logical backup denotes copying of the database data via a logical dump tool (mongodump). A pbm-agent connects to the database, retrieves the data, and writes it to the storage. While restoration the pbm-agent retrieves the data from the storage location and inserts it on every primary node in the cluster. The remaining nodes receive the data during the replication process.

E.g.,

shell> pbm backup --type=physical

Unfortunately, MongoDB does not support hot/physical backup in the community Mongo edition, so only logical backup will be possible.

Especially in the case of physical backup restorations, we might have to perform some additional steps mentioned below.

  • Restart all mongod nodes and pbm-agents.
  • Resync the backup list from the storage using “pbm config –force-resync –file/etc/pbm_config.yaml”.
  • Start the balancer and the mongos node.

Note: PBM backup by default will use the (“secondary nodes”) for backup based on election, and in case no secondaries respond, then the backup will be initiated on the Primary. We can also control the election behaviour by defining a priority for Mongo nodes.

Conclusion

In this blog post, we explored how simple and convenient to perform backup and restoration tasks using PBM in replica set and sharding topologies. PBM simplifies the whole process in such complex topologies, which might not be ideal with other logical options (MongoDump). In the future, there will be a plan for another blog post [part two] covering more backup options and some other areas of the PBM. So, stay tuned!

Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.

Download Percona Distribution for MongoDB Today!

Share This Post!

Subscribe
Connect with
guest
Label
0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK