4

Upside Down Backups

 1 year ago
source link: https://ivymike.dev/upside-down-backups.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

An Upside-Down Backup Strategy

I have a strange way of doing backups these days. When I save something, the master copy is in the cloud. If a file exists only on my local drive, it's not safe. Only once I've saved the file on my Google Drive or pushed to GitHub do I think of it as actually being saved.

This means I've got to trust my cloud provider, and while I do, that trust only goes so far. I still want backups. But instead of "backing up files in the cloud", I back them up locally, by redownloading them to a local archival drive. This is upside-down of most people's backup strategy, but it really is quite nice once you get it set up.

That being said, I am paranoid. So I go a step further and back up my files from one cloud provider to another.

Here's how I do it.

Rclone

The thing that does most of the heavy lifting is Rclone, a command line tool to manage files on cloud storage. It runs on Windows, Linux, and Mac. It abstracts most of the big cloud storage APIs into a unified interface. It can do a lot of things, including present your files as a virtual drive, but I mostly just use it for the "sync" feature. (You know how Google has a 10 year)

Once you configure Rclone with your cloud provider username and API key using rclone config, synching is as simple as rclone sync gdrive: /mnt/t/gdrive. That command will copy my files from my Google Drive to a local directory in /mnt/t/gdrive. If you want, Rclone will even synchronize files from one provider to a different provider.

Rclone works with almost every large cloud storage system I've heard of: Google Drive, DropBox, Backblaze, OneDrive, S3, generic FTP, etc. A complete list of Rclone supported systems is located here. Sadly, iCloud is a notable exception.

Since cloud providers are prone to changing their APIs frequently, I recommend always checking for and using the latest version of rclone.

For reference (mine as much as yours) here are the actual commands I run:

# Back up Google Photos locally.
# Google Photos API presents a virtual directory structure, so the same photos
#   will appear in a lot of different places. The rclone man pages recommend
#   using "media/by-month" for backups
# My remote is "gphotos", my local directory is "/mnt/t/gphotos"
rclone --progress sync gphotos:media/by-month /mnt/t/gphotos

# Back up Google Photos to Microsoft OneDrive
# Syncing remote-to-remote rather than local-to-remote wastes a
# little bandwidth, but I don't want any "copy of a copy" issues.
# My source remote is "gphotos", my target remote is "onedrive"
rclone --progress sync gphotos:media/by-month onedrive:Backup/gphotos

# Back up Google Drive locally
# I exclude "Video" since I have other copies of that elsewhere
rclone --progress --exclude="/Video/**" sync gdrive: /mnt/t/gdrive

# Back up Google Drive to Microsoft OneDrive
rclone --progress --exclude="/Video/**" sync gdrive: onedrive:Backup/gdrive

Note that during the initial sync Google rate limited me to 75K files a day. Once I hit that limit, I saw 429 Too Many Requests errors. If this happens, just wait until the next day, sync the next 75K files, rinse, repeat, and eventually you'll get there. There are some rclone flags which may help you debug or avoid this problem:

# Useful Rclone flags
# -vv               Very Verbose
# --checkers 1      Limit to one thread checking file size/hash
# --transfers 1     Limit to one thread transferring files
# --tpslimit        Transactions per second limit
# --log-file NAME   Log to a file for later debugging

GitHub backup

Backing up github is a more complicated story. On one hand, you might not even need to do anything if you're satisfied that any machine that you're running git on serves as a backup.

On the other hand, I like to automate things, so I'm using the Golang program ghbackup to periodically download all of my github repos in one fell swoop.

I run it like this:

ghbackup -secret abc_MySecret githubbackup/

This gets the complete list of my repos (private and public) and copies them all--the whole process takes about 10 seconds.

And to close the loop, here's how I sync that local backup to the cloud, using rclone:

rclone --progress sync githubbackup onedrive:Backup/github

The downside is that this only copies your repos; your metadata is ignored. GitHub marketplace has some solutions for automated GitHub backups that include metadata. I have not used them, so YMMV.

Good luck and may your data never go bad!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK