Streaming MongoDB Backups Directly to S3

February 23, 2023

If you ever had to make a quick ad-hoc backup of your MongoDB databases, but there was not enough disk space on the local disk to do so, this blog post may provide some handy tips to save you from headaches.

It is a common practice that before a backup can be stored in the cloud or on a dedicated backup server, it has to be prepared first locally and later copied to the destination.

Fortunately, there are ways to skip the local storage entirely and stream MongoDB backups directly to the destination. At the same time, the common goal is to save both the network bandwidth and storage space (cost savings!) while not overloading the CPU capacity on the production database. Therefore, applying on-the-fly compression is essential.

In this article, I will show some simple examples to help you quickly do the job.

Prerequisites for streaming MongoDB backups

You will need an account for one of the providers offering object storage compatible with Amazon S3. I used Wasabi in my tests as it offers very easy registration for a trial and takes just a few minutes to get started if you want to test the service.

A second need is a tool allowing you to manage the data from a Linux command line. The two most popular ones — s3cmd and AWS — are sufficient, and I will show examples using both.

Installation and setup will depend on your OS and the S3 provider specifics. Please refer to the documentation below to proceed, as I will not cover the installation details here.

* https://s3tools.org/s3cmd
* https://docs.aws.amazon.com/cli/index.html

Backup tools

Two main tools are provided with the MongoDB packages, and both do a logical backup.

mongodump – backups in a form of binary JSON files (BSON)
mongoexport – backups in a form of regular JSON files

Compression tool

We all know gzip or bzip2 are installed by default on almost every Linux distro. However, I find zstd way more efficient, so I’ll use it in the examples.

Examples

I believe real-case examples are best if you wish to test something similar, so here they are.

Mongodump & s3cmd – Single database backup

Let’s create a bucket dedicated to MongoDB data backups:

Shell

$ s3cmd mb s3://mbackups

Bucket 's3://mbackups/' created

Now, do a simple dump of one example database using the −−archive option, which changes the behavior from storing collections data in separate files on disk, to streaming the whole backup to standard output (STDOUT) using common archive format. At the same time, the stream gets compressed on the fly and sent to the S3 destination.
Note the below command does not create a consistent backup with regards to ongoing writes as it does not contain the oplog.

Shell

$ mongodump --db=db2 --archive| zstd | s3cmd put - s3://mbackups/$(date +%Y-%m-%d.%H-%M)/db2.zst

2023-02-07T19:33:58.138+0100 writing db2.products to archive on stdout

2023-02-07T19:33:58.140+0100 writing db2.people to archive on stdout

2023-02-07T19:33:59.364+0100 done dumping db2.people (50474 documents)

2023-02-07T19:33:59.977+0100 done dumping db2.products (516784 documents)

upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-33/db2.zst' [part 1 of -, 15MB] [1 of 1]

15728640 of 15728640 100% in 1s 8.72 MB/s done

upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-33/db2.zst' [part 2 of -, 1491KB] [1 of 1]

1527495 of 1527495 100% in 0s 4.63 MB/s done

After the backup is done, let’s verify its presence in S3:

Shell

$ s3cmd ls -H s3://mbackups/2023-02-07.19-33/

2023-02-07 18:34 16M s3://mbackups/2023-02-07.19-33/db2.zst

Mongorestore & s3cmd – Database restore directly from S3

The below mongorestore command uses archive option as well, which allows us to stream the backup directly to it:

Shell

$ s3cmd get --no-progress s3://mbackups/2023-02-07.20-14/db2.zst - |zstd -d | mongorestore --archive --drop

2023-02-08T00:42:41.434+0100 preparing collections to restore from

2023-02-08T00:42:41.480+0100 reading metadata for db2.people from archive on stdin

2023-02-08T00:42:41.480+0100 reading metadata for db2.products from archive on stdin

2023-02-08T00:42:41.481+0100 dropping collection db2.people before restoring

2023-02-08T00:42:41.502+0100 restoring db2.people from archive on stdin

2023-02-08T00:42:42.130+0100 dropping collection db2.products before restoring

2023-02-08T00:42:42.151+0100 restoring db2.products from archive on stdin

2023-02-08T00:42:43.217+0100 db2.people 16.0MB

2023-02-08T00:42:43.217+0100 db2.products 12.1MB

2023-02-08T00:42:43.217+0100

2023-02-08T00:42:43.654+0100 db2.people 18.7MB

2023-02-08T00:42:43.654+0100 finished restoring db2.people (50474 documents, 0 failures)

2023-02-08T00:42:46.218+0100 db2.products 46.3MB

2023-02-08T00:42:48.758+0100 db2.products 76.0MB

2023-02-08T00:42:48.758+0100 finished restoring db2.products (516784 documents, 0 failures)

2023-02-08T00:42:48.758+0100 no indexes to restore for collection db2.products

2023-02-08T00:42:48.758+0100 no indexes to restore for collection db2.people

2023-02-08T00:42:48.758+0100 567258 document(s) restored successfully. 0 document(s) failed to restore.

Mongodump & s3cmd – Full backup

The below command provides a consistent point-in-time snapshot thanks to oplog option:

Shell

$ mongodump --port 3502 --oplog --archive | zstd | s3cmd put - s3://mbackups/$(date +%Y-%m-%d.%H-%M)/full_dump.zst

2023-02-13T00:05:54.080+0100 writing admin.system.users to archive on stdout

2023-02-13T00:05:54.083+0100 done dumping admin.system.users (1 document)

2023-02-13T00:05:54.084+0100 writing admin.system.version to archive on stdout

2023-02-13T00:05:54.085+0100 done dumping admin.system.version (2 documents)

2023-02-13T00:05:54.087+0100 writing db1.products to archive on stdout

2023-02-13T00:05:54.087+0100 writing db2.products to archive on stdout

2023-02-13T00:05:55.260+0100 done dumping db2.products (284000 documents)

upload: '<stdin>' -> 's3://mbackups/2023-02-13.00-05/full_dump.zst' [part 1 of -, 15MB] [1 of 1]

2023-02-13T00:05:57.068+0100 [####################....] db1.products 435644/516784 (84.3%)

15728640 of 15728640 100% in 1s 9.63 MB/s done

2023-02-13T00:05:57.711+0100 [########################] db1.products 516784/516784 (100.0%)

2023-02-13T00:05:57.722+0100 done dumping db1.products (516784 documents)

2023-02-13T00:05:57.723+0100 writing captured oplog to

2023-02-13T00:05:58.416+0100 dumped 136001 oplog entries

upload: '<stdin>' -> 's3://mbackups/2023-02-13.00-05/full_dump.zst' [part 2 of -, 8MB] [1 of 1]

8433337 of 8433337 100% in 0s 10.80 MB/s done

$ s3cmd ls -H s3://mbackups/2023-02-13.00-05/full_dump.zst

2023-02-12 23:05 23M s3://mbackups/2023-02-13.00-05/full_dump.zst

Mongodump & s3cmd – Full backup restore

By analogy, mongorestore is using the oplogReplay option to apply the log contained in the archived stream:

Shell

$ s3cmd get --no-progress s3://mbackups/2023-02-13.00-05/full_dump.zst - | zstd -d | mongorestore --port 3502 --archive --oplogReplay

2023-02-13T00:07:25.977+0100 preparing collections to restore from

2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "db1", skipping...

2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "db2", skipping...

2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "", skipping...

2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "admin", skipping...

2023-02-13T00:07:25.988+0100 reading metadata for db1.products from archive on stdin

2023-02-13T00:07:25.988+0100 reading metadata for db2.products from archive on stdin

2023-02-13T00:07:26.006+0100 restoring db2.products from archive on stdin

2023-02-13T00:07:27.651+0100 db2.products 11.0MB

2023-02-13T00:07:28.429+0100 restoring db1.products from archive on stdin

2023-02-13T00:07:30.651+0100 db2.products 16.0MB

2023-02-13T00:07:30.652+0100 db1.products 14.4MB

2023-02-13T00:07:30.652+0100

2023-02-13T00:07:33.652+0100 db2.products 32.0MB

2023-02-13T00:07:33.652+0100 db1.products 18.0MB

2023-02-13T00:07:33.652+0100

2023-02-13T00:07:36.651+0100 db2.products 37.8MB

2023-02-13T00:07:36.652+0100 db1.products 32.0MB

2023-02-13T00:07:36.652+0100

2023-02-13T00:07:37.168+0100 db2.products 41.5MB

2023-02-13T00:07:37.168+0100 finished restoring db2.products (284000 documents, 0 failures)

2023-02-13T00:07:39.651+0100 db1.products 49.3MB

2023-02-13T00:07:42.651+0100 db1.products 68.8MB

2023-02-13T00:07:43.870+0100 db1.products 76.0MB

2023-02-13T00:07:43.870+0100 finished restoring db1.products (516784 documents, 0 failures)

2023-02-13T00:07:43.871+0100 restoring users from archive on stdin

2023-02-13T00:07:43.913+0100 replaying oplog

2023-02-13T00:07:45.651+0100 oplog 2.14MB

2023-02-13T00:07:48.651+0100 oplog 5.68MB

2023-02-13T00:07:51.651+0100 oplog 9.34MB

2023-02-13T00:07:54.651+0100 oplog 13.0MB

2023-02-13T00:07:57.651+0100 oplog 16.7MB

2023-02-13T00:08:00.651+0100 oplog 19.7MB

2023-02-13T00:08:03.651+0100 oplog 22.7MB

2023-02-13T00:08:06.651+0100 oplog 25.3MB

2023-02-13T00:08:09.651+0100 oplog 28.1MB

2023-02-13T00:08:12.651+0100 oplog 30.8MB

2023-02-13T00:08:15.651+0100 oplog 33.6MB

2023-02-13T00:08:18.651+0100 oplog 36.4MB

2023-02-13T00:08:21.651+0100 oplog 39.1MB

2023-02-13T00:08:24.651+0100 oplog 41.9MB

2023-02-13T00:08:27.651+0100 oplog 44.7MB

2023-02-13T00:08:30.651+0100 oplog 47.5MB

2023-02-13T00:08:33.651+0100 oplog 50.2MB

2023-02-13T00:08:36.651+0100 oplog 53.0MB

2023-02-13T00:08:38.026+0100 applied 136001 oplog entries

2023-02-13T00:08:38.026+0100 oplog 54.2MB

2023-02-13T00:08:38.026+0100 no indexes to restore for collection db1.products

2023-02-13T00:08:38.026+0100 no indexes to restore for collection db2.products

2023-02-13T00:08:38.026+0100 800784 document(s) restored successfully. 0 document(s) failed to restore.

Mongoexport – Export all collections from a given database, compress, and save directly to S3

Another example is using the tool to create regular JSON dumps; this is also not a consistent backup if writes are ongoing.

Shell

$ ts=$(date +%Y-%m-%d.%H-%M)

$ mydb="db2"

$ mycolls=$(mongo --quiet $mydb --eval "db.getCollectionNames().join('n')")

$ for i in $mycolls; do mongoexport -d $mydb -c $i |zstd| s3cmd put - s3://mbackups/$ts/$mydb/$i.json.zst; done

2023-02-07T19:30:37.163+0100 connected to: mongodb://localhost/

2023-02-07T19:30:38.164+0100 [#######.................] db2.people 16000/50474 (31.7%)

2023-02-07T19:30:39.164+0100 [######################..] db2.people 48000/50474 (95.1%)

2023-02-07T19:30:39.166+0100 [########################] db2.people 50474/50474 (100.0%)

2023-02-07T19:30:39.166+0100 exported 50474 records

upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-30/db2/people.json.zst' [part 1 of -, 4MB] [1 of 1]

4264922 of 4264922 100% in 0s 5.71 MB/s done

2023-02-07T19:30:40.015+0100 connected to: mongodb://localhost/

2023-02-07T19:30:41.016+0100 [##......................] db2.products 48000/516784 (9.3%)

2023-02-07T19:30:42.016+0100 [######..................] db2.products 136000/516784 (26.3%)

2023-02-07T19:30:43.016+0100 [##########..............] db2.products 224000/516784 (43.3%)

2023-02-07T19:30:44.016+0100 [##############..........] db2.products 312000/516784 (60.4%)

2023-02-07T19:30:45.016+0100 [##################......] db2.products 408000/516784 (78.9%)

2023-02-07T19:30:46.016+0100 [#######################.] db2.products 496000/516784 (96.0%)

2023-02-07T19:30:46.202+0100 [########################] db2.products 516784/516784 (100.0%)

2023-02-07T19:30:46.202+0100 exported 516784 records

upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-30/db2/products.json.zst' [part 1 of -, 11MB] [1 of 1]

12162655 of 12162655 100% in 1s 10.53 MB/s done

$ s3cmd ls -H s3://mbackups/$ts/$mydb/

2023-02-07 18:30 4M s3://mbackups/2023-02-07.19-30/db2/people.json.zst

2023-02-07 18:30 11M s3://mbackups/2023-02-07.19-30/db2/products.json.zst

Mongoimport & s3cmd – Import single collection under a different name

Shell

$ s3cmd get --no-progress s3://mbackups/2023-02-08.00-49/db2/people.json.zst - | zstd -d | mongoimport -d db2 -c people_copy

2023-02-08T00:53:48.355+0100 connected to: mongodb://localhost/

2023-02-08T00:53:50.446+0100 50474 document(s) imported successfully. 0 document(s) failed to import.

Mongodump & AWS S3 – Backup database

Shell

$ mongodump --db=db2 --archive | zstd | aws s3 cp - s3://mbackups/backup1/db2.zst

2023-02-08T11:34:46.834+0100 writing db2.people to archive on stdout

2023-02-08T11:34:46.837+0100 writing db2.products to archive on stdout

2023-02-08T11:34:47.379+0100 done dumping db2.people (50474 documents)

2023-02-08T11:34:47.911+0100 done dumping db2.products (516784 documents)

$ aws s3 ls --human-readable mbackups/backup1/

2023-02-08 11:34:50 16.5 MiB db2.zst

Mongorestore & AWS S3 – Restore database

Shell

$ aws s3 cp s3://mbackups/backup1/db2.zst - | zstd -d | mongorestore --archive --drop

2023-02-08T11:37:08.358+0100 preparing collections to restore from

2023-02-08T11:37:08.364+0100 reading metadata for db2.people from archive on stdin

2023-02-08T11:37:08.364+0100 reading metadata for db2.products from archive on stdin

2023-02-08T11:37:08.365+0100 dropping collection db2.people before restoring

2023-02-08T11:37:08.462+0100 restoring db2.people from archive on stdin

2023-02-08T11:37:09.100+0100 dropping collection db2.products before restoring

2023-02-08T11:37:09.122+0100 restoring db2.products from archive on stdin

2023-02-08T11:37:10.288+0100 db2.people 16.0MB

2023-02-08T11:37:10.288+0100 db2.products 13.8MB

2023-02-08T11:37:10.288+0100

2023-02-08T11:37:10.607+0100 db2.people 18.7MB

2023-02-08T11:37:10.607+0100 finished restoring db2.people (50474 documents, 0 failures)

2023-02-08T11:37:13.288+0100 db2.products 47.8MB

2023-02-08T11:37:15.666+0100 db2.products 76.0MB

2023-02-08T11:37:15.666+0100 finished restoring db2.products (516784 documents, 0 failures)

2023-02-08T11:37:15.666+0100 no indexes to restore for collection db2.products

2023-02-08T11:37:15.666+0100 no indexes to restore for collection db2.people

2023-02-08T11:37:15.666+0100 567258 document(s) restored successfully. 0 document(s) failed to restore.

In the above examples, I used both mongodump/mongorestore and mongoexport/mongoimport tools to backup and recover your MongoDB data directly to and from the S3 object storage type, while doing it the streaming and compressed way. Therefore, these methods are simple, fast, and resource-friendly. I hope what I used will be useful when you are looking for options to use in your backup scripts or ad-hoc backup tasks.

Additional tools

Here, I would like to mention that there are other free and open source backup solutions you may try, including Percona Backup for MongoDB (PBM), which now offers both logical and physical backups:

With the Percona Server for MongoDB variant, you may also stream hot physical backups directly to S3 storage:

https://docs.percona.com/percona-server-for-mongodb/6.0/hot-backup.html#streaming-hot-backups-to-a-remote-destination

It is as easy as this:

Shell

mongo > db.runCommand({createBackup: 1, s3: {bucket: "mbackups", path: "my_physical_dump1", endpoint: "s3.eu-central-2.wasabisys.com"}})

{ "ok" : 1 }

$ s3cmd du -H s3://mbackups/my_physical_dump1/

138M 26 objects s3://mbackups/my_physical_dump1/

For a sharded cluster, you should use PBM rather for consistent backups.

Btw, don’t forget to check out the MongoDB best backup practices!

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB today!

Connect with

Label

0 Comments

Back to Blog

Streaming MongoDB Backups Directly to S3

Streaming MongoDB Backups Directly to S3

Prerequisites for streaming MongoDB backups

Backup tools

Compression tool

Examples

Mongodump & s3cmd – Single database backup

Mongorestore & s3cmd – Database restore directly from S3

Mongodump & s3cmd – Full backup

Mongodump & s3cmd – Full backup restore

Mongoexport – Export all collections from a given database, compress, and save directly to S3

Mongoimport & s3cmd – Import single collection under a different name

Mongodump & AWS S3 – Backup database

Mongorestore & AWS S3 – Restore database

Additional tools

Recommend

This adorably tiny Macintosh is actually a powerful wall charger [Review]

The Classic Mercedes Convertible That Symbolized '60s Cool

午报｜饿了么星选停止运营；李彦宏称文心一言将与搜索等业务整合；微软推出IOS/安卓版...

How Ford's Panther Platform Became An Everyday Hero And Future Classic

Run For Your Favorite Database at Percona Live 2023

美股周四收盘主要股指全线上涨奈飞跌幅超3%

Improving and expanding use cases: Physics changes in Unity 2022.2

World's Best Filmmakers To Intern With - CEOWORLD magazine

Wealthiest People in Spain (February 23, 2023)

5 Best Joy-Con Alternatives For Your Nintendo Switch

About Joyk