Backup WD MyCloud to S3/Glacier with duplicity (build instructions included)

April 3, 2015

How to back up your precious files stored on the WD My Cloud NAS into S3 with the slow but low-cost storage class "Glacier".

How does the backup work: duplicity does its job and uploads files to S3. The large data archives are recognized by S3 Lifecycle rules that we set up based on their prefix and moved to the Glacier storage class soon after upload. (It takes hours to restore something from Glacier but its cost is orders of magnitude lower than that of S3 itself). We leave metadata files in S3 so that duplicity can read them.

90% of this is based on http://www.x2q.net/2013/02/24/howto-backup-wd-mybook-live-to-amazon-s3-and-glacier/ and the WD build guide (http://community.wd.com/t5/WD-My-Cloud/GUIDE-Building-packages-for-the-new-firmware-someone-tried-it/m-p/770653#M18650 and the update at http://community.wd.com/t5/WD-My-Cloud/GUIDE-Building-packages-for-the-new-firmware-someone-tried-it/m-p/841385#M27799). Kudos to the authors!

You will need to:

Build duplicity and its dependencies (since WD Debian v04 switched to page size of 64kB, all pre-built binaries are unusable)
Configure S3 to move the data files to Glacier after 0 days
Create your backup script - see backup-pictures-to-s3.sh
Schedule to run incremental backups regularly via Cron
Preferably test restore manually

0. Download sources for this "miniproject"

Download the files for this from the GitHub repository miniprojects/mycloud-duplicity-backup via Git or as .zip.

1. Build duplicity and its dependencies

See ./mycloud-build-vm/README.md This is based on duplicity 0.6.24 (available in the Jessie release of Debian); the older one in Wheezy does not support the crucial option --file-prefix-archive.

2. Configure S3

Create a backup bucket - either call it my-backup-bucket or update the backup script with your bucket name. (Duplicity can sometimes create it but especially if you want it in an European zone, it might be easier to create it manually).

Set rules to move the large data files to Glacier (they will remain visible in the bucket but their Storage Class will become Glacier soon after upload; they will not be visible directly in Glacier). Given the example backup script and the two prefixes it uses, you want to configure add Lifecycle rules for both:

Rule Name: Archive to Glacier
Apply the Rule to: A prefix - either bob-data- or shared_pictures-data-
Action on Objects: Archive Only
Archive to the Glacier Storage Clas 0 days after the object's creation date.

Tip: Create a dedicated user for backups via AWS IAM, having access only to the backup bucket; this is the Policy you would want to create (modify the bucket name as appropriate):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::my-backup-bucket", "arn:aws:s3:::my-backup-bucket/*"]
    }
  ]
}

3. Create your backup script

Modify the attached backup-pictures-to-s3.sh:

Set your AWS ID and secret
Modify the supported SRC_ARG, SOURCE, and PREFIX values

Notice that the script sets a prefix for all the files (data archive, manifest, ...) to distinguish backups of different directories and also adds another prefix (data-) to the archive files so that we can move just these to Glacier.

4. Schedule to run incremental backups regularly via Cron

For example to backup pictures every Tuesday and phone pictures every Wednesday at 20:00, add this to crontab:

0 20 * * 2 /root/backup-pictures-to-s3.sh pictures
0 20 * * 3 /root/backup-pictures-to-s3.sh phone

5. Preferably test restore manually

See ./restore.example. You likely also want to try these: duplicity list-current-files [options] target_url, duplicity verify [options] source_url target_dir, duplicity collection-status [options] target_url to verify the backup is alright.

Caveats

You likely want to run a full backup some time and clean up old (incremental) backups. This has to be done manually.

Binaries

I prefer to build my binaries myself but if you prefer, you may download by duplicity and dependencies .debs here; I will eventually remove them but likely not before 8/2015.

Tags: DevOps

Are you benefitting from my writing? Consider buying me a coffee or supporting my work via GitHub Sponsors. Thank you! You can also book me for a mentoring / pair-programming session via Codementor or (cheaper) email.

Allow me to write to you!

Let's get in touch! I will occasionally send you a short email with a few links to interesting stuff I found and with summaries of my new blog posts. Max 1-2 emails per month. I read and answer to all replies.

Backup WD MyCloud to S3/Glacier with duplicity (build instructions included)

Backup WD MyCloud to S3/Glacier with duplicity (build instructions included)

0. Download sources for this "miniproject"

1. Build duplicity and its dependencies

2. Configure S3

3. Create your backup script

4. Schedule to run incremental backups regularly via Cron

5. Preferably test restore manually

Caveats

Binaries

Allow me to write to you!

Recommend

core.async: "Can't recur here" in ClojureScript but OK in Clojure

速报：禁售＋强制，亚马逊给出最后期限！

Spring出现了堪比Log4j的超级大漏洞？官方回应来了_AI_Spring官方博客_InfoQ精选文章

如何有效的杀死吸血鬼

Code Is Cheap, It's Knowledge Discovery That Costs

Continuous Delivery Digest: Ch.9 Testing Non-Functional Requirements

你真的理解产品定位广告吗？

Do You Know Why You Are Testing?! (On The Principles Underlying TDD)

Most interesting links of November

Tips And Resources For Creating DSLs in Groovy

About Joyk