使用开源工具实现Kubernetes备份容灾

Kubernetes的备份目前官方社区没有现成的成熟方案，当前使用最多的方式还是通过etcd快照做数据备份。

但是etcd的备份只能备份Kubernetes的资源，不能备份存储在PV数据卷的业务数据，而这些数据往往才是最核心的，Kubernetes资源没了至少可以重新Apply，业务数据丢了是毁灭性的灾难。

数据卷的备份可能需要取决于使用不同的PV存储后端采用不用的备份方案，以Ceph RBD为例，可以定时对RBD Image做快照，并通过快照把数据备份到对象存储系统中，或者通过Ceph mirror实现异地复制容灾。而公有云场景则可以直接使用公有云的备份服务实现对volume备份，至少AWS提供了类似这种方案。

如上方案是基于数据块的形式对整卷做备份，当然也可以基于文件备份，这种方式其实备份效率会更高些，用多少备多少，结合去重技术和压缩算法，可以大大节省存储空间。

大家可能比较容易想到的方案是通过SideCar方式注入rsync或其他备份工具实时同步文件到远端存储中实现业务数据的实时增量备份。

在这里我推荐restic这款开源工具代替rsync做基于文件的备份，不仅仅适用于企业级Kubernetes环境的数据卷备份，个人的笔记本电脑备份也可以通过restic这款开源的工具实现。

当然现在很多现成的工具如iCloud、百度网盘等也能实现本地电脑备份同步，不过自己使用restic加密工具备份更自主可控、更安心些，价格也会更便宜。

总之，通过restic以SideCar的形式注入到包含PV的Pod中实现备份，技术上肯定是可行的，不过实现落地上可能会稍微复杂些，而开源的velero已经完全解决了这个问题。

因此本文接下来主要介绍如何利用开源的velero方案实现Kubernetes的应用备份。这里需要强调的是，虽然使用velero做了应用级别的备份，但并不意味着不再需要对Kubernetes做集群备份，etcd以及Kubernetes的证书、kubeadm的配置等依然需要在备份策略中考虑到，在集群恢复中不可或缺。

由于Velero备份PV数据卷正是使用了前面的restic，因此本文先简单介绍下restic的使用方法，这部分内容与Kubernetes备份没有直接关系，但是了解底层对后续使用velero恢复数据很有用，已经对restic有了解的可以跳过。

restic简介

restic是一款开源的跨Linux、MacOS、Windows等多种平台操作系统的命令行备份工具，支持将本地文件全量或者增量加密备份到S3、SFTP服务器、远端目录、MinIO对象存储等远端仓库中，可以代替我们常用的rsync工具。

restic包含如下五个设计理念：

简单（Easy），备份和恢复只需要简单一个命令即可完成，不需要太复杂的配置和指令。
快（Fast），备份和恢复速度仅受限于网络带宽和磁盘读写速率，工具本身不应该成为性能瓶颈。
可校验（Verifiable），用户可以随时查看和检索任意备份点中备份的所有文件内容，从而确定备份是OK的。
安全（Secure），数据备份强加密存储，即使远端存储仓库泄露被攻击者拿到，攻击者也拿不到真实明文数据。
高效（Efficient），基于文件备份，只备份增量文件，自动去重，从而节省存储空间。

restic配置以及仓库初始化

restic可以从官网直接下载，下载后建议配置自动补全：

restic generate --bash-completion restic.bash_completion  

source restic.bash_completion

restic用法正如其设计原则，非常简单。首先初始化备份仓库，这里我们使用开源的Minio对象存储作为备份仓库，桶策略已提前配置好，Key和Secret通过环境变量进行配置：

export AWS_ACCESS_KEY_ID=93E0...2MV4K  

export AWS_SECRET_ACCESS_KEY=wulg1N...rXgGR

使用init子命令初始化仓库，restic为了安全性考虑，仓库时需要指定密码，密码请务必记住，密码丢了数据将无法恢复，这样做是为了防止备份仓库的数据泄露导致业务数据泄露。

# restic -r s3:http://int32bit-minio-server/local-backup init  

enter password for new repository:  

enter password again:  

created restic repository 94c40f5300 at s3:http://int32bit-minio-server/local-backup

使用stats子命令查看仓库状态：

# restic -r s3:http://int32bit-minio-server/local-backup stats  

enter password for new repository:  

enter password again:  

repository 94c40f53 opened successfully, password is correct  

scanning...  

Stats in restore-size mode:  

Snapshots processed:   0  

     Total Size:   0 B

当前为空仓库，因此大小和快照数量均为0。

为了安全性考虑，每次对备份仓库进行查看、备份、恢复等所有操作均需要输入密码，这在生产环境上是必须的，这里为了测试方便写入环境变量中并指定仓库地址：

export RESTIC_PASSWORD=*********  

export RESTIC_REPOSITORY=s3:http://int32bit-minio-server/local-backup

此时只需要直接运行restic stats即可查看仓库信息，无需指定仓库地址以及输入密码。

# restic stats  

scanning...  

Stats in restore-size mode:  

Snapshots processed:   0  

     Total Size:   0 B

通过backup子命令执行备份操作：

# mkdir -p backup-demo  

# echo "hello" >backup-demo/hello.txt  

# restic backup backup-demo/  

no parent snapshot found, will read all files  

Files:           1 new,     0 changed,     0 unmodified  

Dirs:            1 new,     0 changed,     0 unmodified  

Added to the repo: 754 B  

processed 1 files, 6 B in 0:00  

snapshot 55572d0c saved

首次备份因为没有父备份点，因此为全量备份，从备份中输出中我们可以查看备份的文件数量以及大小。

我们写入一个新文件并修改其中一个文件，再次执行备份操作：

# echo "new_file" >backup-demo/new_file.txt  

# echo "helloworld!" >backup-demo/hello.txt  

# restic backup backup-demo/  

using parent snapshot 55572d0c  

Files:           1 new,     1 changed,     0 unmodified  

Dirs:            0 new,     1 changed,     0 unmodified  

Added to the repo: 1.107 KiB  

processed 2 files, 21 B in 0:00  

snapshot f7d5b7c5 saved

可见当我们写入一个新文件并且修改了原hello.txt文件，再次运行备份程序，此时默认为增量备份，从备份结果中我们看到新增了1个文件、修改了一个文件。

备份时默认会备份指定目录的所有文件，包含隐藏文件，可以通过指定--exclude参数排除需要备份的文件，也可以通过--file-from指定需要备份的文件列表。

另外可以每次备份时指定一个或者多个标签，便于后期基于tag做快照检索。

每次备份时都会创建一个snapshot快照实例，backup结果会输出snapshot id，可以通过snapshots参数列举该仓库下的所有snapshots实例，当然也可以指定标签过滤：

# restic snapshots  

ID        Time                 Host               Paths  

--------------------------------------------------------------------  

55572d0c  2021-10-11 14:09:11  int32bit-test-1    /root/backup-demo  

f7d5b7c5  2021-10-11 14:12:39  int32bit-test-1    /root/backup-demo  

--------------------------------------------------------------------

通过diff参数查看两个snapshots的差量：

# restic diff 55572d0c f7d5b7c5  

comparing snapshot 55572d0c to f7d5b7c5:  

M    /backup-demo/hello.txt  

+    /backup-demo/new_file.txt  

Files:           1 new,     0 removed,     1 changed  

Dirs:            0 new,     0 removed  

Others:          0 new,     0 removed  

Data Blobs:      2 new,     1 removed  

Tree Blobs:      2 new,     2 removed  

Added:   1.107 KiB  

Removed: 754 B

文件检索以及查看文件内容

通过ls子命令可以查看指定快照中的所有文件列表：

# restic ls f18cccc5  

snapshot f18cccc5:  

/backup-demo  

/backup-demo/hello.txt  

/backup-demo/hello2.txt  

/backup-demo/new_file.txt  

# restic ls -l f18cccc5  

snapshot f18cccc5:  

drwxr-xr-x     0     0      0 2021-10-11 14:40:30 /backup-demo  

--w-r--r--     0     0     12 2021-10-11 14:12:20 /backup-demo/hello.txt  

-rw-r--r--     0     0      7 2021-10-11 14:40:30 /backup-demo/hello2.txt  

-rw-r--r--     0     0      9 2021-10-11 14:11:38 /backup-demo/new_file.txt

通过find命令从所有快照中查找文件，这个命令对于文件误删除后进行文件找回非常有用：

# restic find hello*  

Found matching entries in snapshot f18cccc5 from 2021-10-11 14:40:36  

/backup-demo/hello.txt  

/backup-demo/hello2.txt  

Found matching entries in snapshot 55572d0c from 2021-10-11 14:09:11  

/backup-demo/hello.txt  

Found matching entries in snapshot 7728a603 from 2021-10-11 14:23:47  

/backup-demo/hello.txt  

Found matching entries in snapshot f7d5b7c5 from 2021-10-11 14:12:39  

/backup-demo/hello.txt

通过dump命令可以查看指定快照指定文件的内容：

# restic dump f18cccc5 /backup-demo/hello2.txt  

hello2

更强大的是可以通过mount命令把整个快照内容挂载到本地：

# restic mount /mnt  

Now serving the repository at /mnt  

When finished, quit with Ctrl-c or umount the mountpoint.  

# mount | grep /mnt  

restic on /mnt type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0)  

# cat /mnt/snapshots/latest/backup-demo/hello2.txt  

hello2  

# umount /mnt

如上把所有快照挂载到本地的/mnt目录下，并查看了latest最新快照的hello2.txt的内容，最后卸载/mnt。
恢复文件也非常简单，直接使用restore命令即可。

通过restore命令可恢复指定快照的文件到本地：

# restic restore f18cccc5 -t /tmp/restore_data  

restoring <Snapshot f18cccc5 of [/root/backup-demo] at 2021-10-11 14:40:36.459899498 +0800 CST by root@k8s-master-1> to /tmp/restore_data  

# find /tmp/restore_data/  

/tmp/restore_data/  

/tmp/restore_data/backup-demo  

/tmp/restore_data/backup-demo/new_file.txt  

/tmp/restore_data/backup-demo/hello.txt  

/tmp/restore_data/backup-demo/hello2.txt

当然也可以通过前面介绍的dump命令实现单个文件恢复：

restic dump f18cccc5 /backup-demo/hello2.txt >hello2.txt

通过forget可以删除指定id的快照内容，当然我们实际使用更多的是按照时间或者快照数量进行快照保留或者删除，比如保留前7天的快照，保留最新的3个快照等等。

我们可以通过--dry-run参数查看指定策略会删除的快照，但实际不会执行删除操作，用于检验参数是否符合预期。

如下我们执行只保留最新的3个快照：

# restic forget --keep-last=3 --dry-run  

Applying Policy: keep 3 latest snapshots  

keep 3 snapshots:  

ID        Time                 Host                      Tags           Reasons        Paths  

--------------------------------------------------------------------------------------------------------  

7728a603  2021-10-11 14:23:47  int32bit-test-1                 last snapshot  /root/backup-demo  

f18cccc5  2021-10-11 14:40:36  int32bit-test-1                 last snapshot  /root/backup-demo  

56e7b24f  2021-10-11 15:37:50  int32bit-test-1  app_name=test  last snapshot  /root/backup-demo  

--------------------------------------------------------------------------------------------------------  

3 snapshots  

remove 2 snapshots:  

ID        Time                 Host                      Tags        Paths  

--------------------------------------------------------------------------------------  

55572d0c  2021-10-11 14:09:11  int32bit-test-1              /root/backup-demo  

f7d5b7c5  2021-10-11 14:12:39  int32bit-test-1              /root/backup-demo  

--------------------------------------------------------------------------------------  

2 snapshots  

keep 1 snapshots:  

ID        Time                 Host                      Tags        Reasons        Paths  

--------------------------------------------------------------------------------------------  

112668f0  2021-10-11 15:28:22  int32bit-test-1              last snapshot  /recover  

--------------------------------------------------------------------------------------------  

1 snapshots  

Would have removed the following snapshots:  

{55572d0c f7d5b7c5}

其他删除策略，比如保留前2个小时的最新备份：

# restic  forget --dry-run --keep-hourly 2  

Applying Policy: keep 2 hourly snapshots  

keep 2 snapshots:  

ID        Time                 Host                      Tags           Reasons          Paths  

----------------------------------------------------------------------------------------------------------  

56e7b24f  2021-10-11 15:37:50  int32bit-test-1  app_name=test  hourly snapshot  /root/backup-demo  

f949e14b  2021-10-11 16:11:44  int32bit-test-1                 hourly snapshot  /root/backup-demo  

----------------------------------------------------------------------------------------------------------  

2 snapshots  

remove 5 snapshots:  

ID        Time                 Host                      Tags        Paths  

--------------------------------------------------------------------------------------  

55572d0c  2021-10-11 14:09:11  int32bit-test-1              /root/backup-demo  

f7d5b7c5  2021-10-11 14:12:39  int32bit-test-1              /root/backup-demo  

7728a603  2021-10-11 14:23:47  int32bit-test-1              /root/backup-demo  

f18cccc5  2021-10-11 14:40:36  int32bit-test-1              /root/backup-demo  

ab268923  2021-10-11 16:01:04  int32bit-test-1              /root/backup-demo  

--------------------------------------------------------------------------------------  

5 snapshots  

Would have removed the following snapshots:  

{55572d0c 7728a603 ab268923 f18cccc5 f7d5b7c5}

如上只保留前2个小时的备份，注意由于14点以及16点均备份了多次，该策略只会保留以小时为单位计算中最新的一份备份。
restic为命令行CLI工具，不支持通过后台服务形式运行，因此不支持备份计划配置，但是很容易通过Linux自带的crontab工具进行配置。

使用开源Velero工具实现Kubernetes应用备份容灾

前面介绍了restic工具以及提到了Velero工具，它是一个云原生的Kubernetes灾难恢复和迁移工具，Velero的前身是Heptio公司的Ark工具，后被VMware公司收购，底层数据卷的备份用的正是restic。

Kubernetes备份工具除了Velero，其实还有已被veeam收购的kasten以及专门做PV卷备份的Stash（底层用的也是restic）。

Velero配置

关于Velero的详细配置和安装方法可以参考官方文档，这里仅做简要描述。

以Minio对象存储为备份目标端为例，通过Velero客户端生成yaml文件：

./velero install \  

--provider aws \  

--plugins xxx/velero-plugin-for-aws:v1.0.0 \  

--bucket velero \  

--secret-file ./aws-iam-creds \  

--backup-location-config region=test,s3Url=http://192.168.0.1,s3ForcePathStyle="true" \  

--snapshot-location-config region=test \  

--image xxx/velero:v1.6.3 \  

--features=EnableCSI \  

--use-restic \  

--dry-run -o yaml

其中：

--plugins以及--image参数指定镜像仓库地址，仅当使用私有镜像仓库时需要配置。
--use-restic参数开启使用restic备份PV数据卷功能。
早期Kubernetes的volume卷不支持快照，因此备份PV卷时需要安装特定的后端存储卷插件，Kubernetes从v1.12开始CSI引入Snapshot后可以利用Snapshot特性实现备份，指定--features=EnableCSI参数开启，开启该模式的底层存储必须支持snapshot，并且配置了snapshot相关的CRD以及volumesnapshotclass（类似storageclass）。

数据恢复需要依赖velero-restic-restore-helper工具，如果使用私有镜像仓库，可以通过restic configmap配置私有镜像地址：

apiVersion: v1  

kind: ConfigMap  

metadata:  

name: restic-config  

namespace: velero  

labels:  

velero.io/plugin-config: ""  

velero.io/restic: RestoreItemAction  

data:  

image: xxx/velero-restic-restore-helper:v1.6.3

通过Velero执行备份

关于Velero的使用方法可以参考其他资料，这里仅以带PV的Nginx服务为例阐述备份过程以及恢复原理，Nginx的yaml声明文件内容如下：

# nginx-app-demo.yaml  

---  

apiVersion: v1  

kind: Namespace  

metadata:  

name: nginx-app  

---  

apiVersion: v1  

kind: PersistentVolumeClaim  

metadata:  

name: pvc-demo  

namespace: nginx-app  

spec:  

accessModes:  

- ReadWriteOnce  

resources:  

requests:  

  storage: 1Gi  

storageClassName: ceph-rbd-sata  

---  

apiVersion: apps/v1  

kind: Deployment  

metadata:  

labels:  

app: nginx  

name: nginx  

namespace: nginx-app  

spec:  

replicas: 1  

selector:  

matchLabels:  

  app: nginx  

template:  

metadata:  

  labels:  

    app: nginx  

  annotations:  

    backup.velero.io/backup-volumes: mypvc  

spec:  

  containers:  

  - image: nginx  

    name: nginx  

    volumeMounts:  

      - name: mypvc  

        mountPath: /usr/share/nginx/html  

  volumes:  

  - name: mypvc  

    persistentVolumeClaim:  

      claimName: pvc-demo  

      readOnly: false  

---  

apiVersion: v1  

kind: Service  

metadata:  

labels:  

app: nginx  

name: nginx  

namespace: nginx-app  

spec:  

ports:  

- port: 80  

protocol: TCP  

targetPort: 80  

selector:  

app: nginx

如上yaml仅需要关注如下两点：

声明了一个PVC，并挂载到Nginx Pod的/usr/share/nginx/html路径。
Pod添加了注解backup.velero.io/backup-volumes: mypvc用于指定需要备份的Volume。因为并不是所有的Volume都必须备份，实际生产中可根据数据的重要性设置合理的备份策略，因此不建议开启--default-volumes-to-restic选项，该选项会默认备份所有的Volume。

我们进入Nginx中写入测试数据：

# kubectl  exec -t -i nginx-86f99c968-sj8ds -- /bin/bash  

cd /usr/share/nginx/html/  

echo "HelloWorld" >index.html  

echo "hello1" >hello1.html  

echo "hello2" >hello2.html

此时我们访问Nginx Service会输出HelloWorld。

执行velero backup命令创建备份：

velero backup create nginx-backup-1 --include-namespaces nginx-app

查看备份信息：

# velero describe backups nginx-backup-1  

Name:         nginx-backup-1  

Namespace:    velero  

Labels:       velero.io/storage-location=default  

Phase:  Completed  

Namespaces:  

Included:  nginx-app  

Storage Location:  default  

Velero-Native Snapshot PVs:  auto  

TTL:  720h0m0s  

Backup Format Version:  1.1.0  

Started:    2021-12-18 09:35:35 +0800 CST  

Completed:  2021-12-18 09:35:47 +0800 CST  

Expiration:  2022-01-17 09:35:35 +0800 CST  

Total items to be backed up:  22  

Items backed up:              22  

Restic Backups :  

Completed:  1

从描述信息中有如下几个值得关注的点：

备份状态为Completed，说明备份完成，记录中会有备份开始时间和完成时间。
备份的资源数和完成数。
备份的Volume数（Restic Backups）。

备份数据管理以及迁移

在S3中可以查看备份的内容：

# aws s3 ls velero/backups/nginx-backup-1/  

2021-12-18 09:35:47         29 nginx-backup-1-csi-volumesnapshotcontents.json.gz  

2021-12-18 09:35:47         29 nginx-backup-1-csi-volumesnapshots.json.gz  

2021-12-18 09:35:47       4730 nginx-backup-1-logs.gz  

2021-12-18 09:35:47        936 nginx-backup-1-podvolumebackups.json.gz  

2021-12-18 09:35:47        372 nginx-backup-1-resource-list.json.gz  

2021-12-18 09:35:47         29 nginx-backup-1-volumesnapshots.json.gz  

2021-12-18 09:35:47      10391 nginx-backup-1.tar.gz  

2021-12-18 09:35:47       2171 velero-backup.json

当然也可以通过download把备份下载导出到本地：

# velero backup download nginx-backup-1  

Backup nginx-backup-1 has been successfully downloaded to /tmp/nginx-backup-1-data.tar.gz  

# mkdir -p nginx-backup-1  

# tar xvzf nginx-backup-1-data.tar.gz -C nginx-backup-1/  

# ls -l nginx-backup-1/resources/  

total 48  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 deployments.apps  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 endpoints  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 endpointslices.discovery.k8s.io  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 events  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 namespaces  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 persistentvolumeclaims  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 persistentvolumes  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 pods  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 replicasets.apps  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 secrets  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 serviceaccounts  

drwxr-xr-x 4 root root 4096 Dec 18 09:46 services

如上两种方式都可以实现备份数据的导出迁移，但是需要注意的是，如上数据只包含Kubernetes声明资源的yaml文件，不包含最重要的Volume业务数据。这些数据保存在S3的velero/restic/nginx-app/路径下，而这些数据是加密存储的。

从安全角度而言，这样做是安全合理的。但是对于运维人员来说，这是黑盒子，我们如何确定Volume的数据完整备份了呢（Verifiable原则）。或者极端场景下，我的业务不跑容器了，想迁到物理机上本地直接运行，我的业务数据如何高效快速迁移。

办法总是有的，通过Velero恢复到容器中，然后通过容器把数据迁走就可以了，但是这似乎有点麻烦，而且依赖于Velero。有没有办法直接通过restic工具进行备份数据的管理呢？

根据前面关于restic的介绍，这些数据是加密存储的，那我们读取数据就需要restic的仓库密码。

这个密码其实存储在velero-restic-credentials Secret中，任何有权限的管理员都可以读取，因此这里也特别需要注意控制velero的访问权限。

# kubectl get secrets velero-restic-credentials \  

-o jsonpath='{.data.repository-password}' | base64 -d

拿到了仓库密码，我们就能使用原生的restic工具对备份数据进行管理了。

首先查看snapshots列表：

# restic -r s3:http://192.168.0.1/velero/restic/nginx-app snapshots  

14fc2081  2021-12-18 09:35:45 ... # 输出有点长，省去了后面的输出内容

查看备份的文件：

# restic -r s3:http://192.168.0.1/velero/restic/nginx-app ls 14fc2081  

snapshot 14fc2081:  

/hello1.html  

/hello2.html  

/index.html

查看指定备份文件的内容：

# restic -r s3:http://192.168.0.1/velero/restic/nginx-app dump 14fc2081 /hello2.html  

hello2

通过restic工具，我们可以很轻易的进行备份数据管理以及数据迁移。
前面我们通过Velero备份了nginx-app namespace下的所有资源包括Volume数据。

现在我们把整个nginx-app删除：

kubectl delete -f nginx-app-demo.yaml

该命令会把整个namespace的所有资源彻底删除，包括PV数据卷的文件，在底层存储中也会彻底把Volume删除。

# kubectl get all -n nginx-app  

No resources found in nginx-app namespace.  

# kubectl get ns nginx-app  

Error from server (NotFound): namespaces "nginx-app" not found

从如上输出结果看，数据已经完全删除。

接着我们通过Velero执行数据恢复：

# velero restore create --from-backup nginx-backup-1   

Restore request "nginx-backup-1-20211218102506" submitted successfully.  

# velero restore get  

NAME                            BACKUP           STATUS  

nginx-backup-1-20211218102506   nginx-backup-1   InProgress  

# velero restore get  

NAME                            BACKUP           STATUS  

nginx-backup-1-20211218102506   nginx-backup-1   Completed

velero恢复完成后，我们验证nginx应用是否完全恢复，首先查看Pod和Service：

# kubectl get pod -n nginx-app  

NAME                    READY   STATUS    RESTARTS   AGE  

nginx-86f99c968-8zh6m   1/1     Running   0          103s  

# kubectl get svc -n nginx-app  

NAME    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE  

nginx   ClusterIP   10.106.140.195   <none>        80/TCP    2m37

从输出结果看，原来nginx-app namespace的资源均完全恢复并且处于运行状态。接下来只需要检查业务数据是否恢复：

# kubectl exec -t -i -n nginx-app nginx-86f99c968-8zh6m -- ls /usr/share/nginx/html/  

hello1.html  hello2.html  index.html  lost+found  

# kubectl get svc -n nginx-app  

NAME    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE  

nginx   ClusterIP   10.106.140.195   <none>        80/TCP    2m37s  

# curl  10.106.140.195  

HelloWorld

经验证，业务数据是OK的，业务也正常恢复。

我们查看Pod声明：

apiVersion: v1  

kind: Pod  

metadata:  

annotations:  

backup.velero.io/backup-volumes: mypvc  

labels:  

app: nginx  

pod-template-hash: 86f99c968  

velero.io/backup-name: nginx-backup-1  

velero.io/restore-name: nginx-backup-1-20211218102506  

name: nginx-86f99c968-8zh6m  

namespace: nginx-app  

spec:  

containers:  

- image: nginx  

name: nginx  

volumeMounts:  

- mountPath: /usr/share/nginx/html  

  name: mypvc  

initContainers:  

- args:  

- ead72033-f495-4223-9358-6f97c920e9ae  

command:  

- /velero-restic-restore-helper  

env:  

- name: POD_NAMESPACE  

  valueFrom:  

    fieldRef:  

      apiVersion: v1  

      fieldPath: metadata.namespace  

- name: POD_NAME  

  valueFrom:  

    fieldRef:  

      apiVersion: v1  

      fieldPath: metadata.name  

image: velero-restic-restore-helper:v1.6.3  

imagePullPolicy: IfNotPresent  

name: restic-wait  

volumeMounts:  

- mountPath: /restores/mypvc  

  name: mypvc  

volumes:  

- name: mypvc  

persistentVolumeClaim:  

  claimName: rbd-pvc-demo

yaml文件与之前初始化声明的大体一样，仅需留意如下两点：

Pod增加Velero备份和恢复相关label。
嵌入了一个initContainer，通过velero-restic-restore-helper实现volume数据的恢复，该工具其实就是restic命令的包装。

备份策略与计划

前面提到restic本身是一个命令行CLI工具，不支持备份计划任务。但是velero是支持备份计划的，备份计划支持如下配置：

备份时间，crontab语法。
备份保留时间，通过ttl指定，默认30天。
备份内容，支持指定namespace或者基于label指定具体的备份资源。

关于Velero备份计划的管理，这里不详细介绍，感兴趣的读者可以参考官方文档，也通过velero create schedule -h命令查看帮助文档和样例：

# Create a backup every 6 hours.  

velero create schedule NAME --schedule="0 */6 * * *"  

# Create a backup every 6 hours with the @every notation.  

velero create schedule NAME --schedule="@every 6h"  

# Create a daily backup of the web namespace.  

velero create schedule NAME --schedule="@every 24h" --include-namespaces web  

# Create a weekly backup, each living for 90 days (2160 hours).  

velero create schedule NAME --schedule="@every 168h" --ttl 2160h0m0s

Kubernetes企业备份容灾方案

根据前面的介绍，可设计Kubernetes的粗略版备份容灾方案：

Kubernetes备份容灾方案

其中：

minio为开源的对象存储，为velero/restic提供备份存储后端，实际生产时调整为企业对象存储系统。
远端存储为异地存储系统，比如异地磁带库、NBU，或者跨region的异地对象存储系统。

备份流程：

Kubernetes的所有资源包括Pod、Deployment、ConfigMap、Secret、PV卷数据等通过Velero备份到对象存储。
通过minio-sync实现实时同步数据到远端同城异地存储系统。

恢复流程：

场景一：集群状态无异常，人为误操作导致数据被删。

直接通过Velero恢复指定时间的数据进行恢复即可。

场景二：PV底层的存储系统crash导致数据丢失。

恢复存储系统集群或者极端情况下重搭存储集群，然后使用Velero从对象存储中恢复数据。

场景三：极端场景下，整个数据中心或者region crash导致数据丢失。

重建环境，业务数据需要从异地数据中复制到本地，然后借助velero从新建对象存储中进行数据恢复。

场景四：Kubernetes环境迁移。

新建Kubernetes集群，通过Velero指定备份点迁移数据到新环境中。

场景五：业务从Kubernetes运行迁移到虚拟机或者物理机运行。

通过Restic从对象存储中把业务数据导出到虚拟机的数据卷中即可。

Kubernetes集群备份

前面介绍了Kubernetes应用级别的备份方案，除了上层应用级别备份，集群本身的备份也尤为重要，Kubernetes几乎所有的元数据均存储在etcd中，因此集群备份的核心就是etcd的备份，除此之外Kubernetes的证书、kubeadm的配置等也需要在备份策略中考虑到，在集群恢复中不可或缺。

关于Kubernetes证书、kubeadm配置的备份可以直接使用前面介绍的restic工具对整个/etc/kubernetes目录进行备份，而etcd的备份官方也有介绍backing-up-an-etcd-cluster。

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshotdb

比如备份etcd到Minio对象存储中，参考脚本如下：

#!/bin/sh  

# bootstrap.sh  

export ETCDCTL_API=3  

MASTER_ENDPOINT=$(etcdctl --endpoints=$ETCD_ENDPOINTS \  

--cacert=/etc/ssl/etcd/ca.crt \  

--cert=/etc/ssl/etcd/etcd.crt \  

--key=/etc/ssl/etcd/etcd.key \  

endpoint status \  

| awk -F ',' '{printf("%s %s\n", $1,$5)}' \  

| tr -s ' ' |  awk '/true/{print $1}')  

echo "etcd master endpoint is ${MASTER_ENDPOINT}"  

BACKUP_FILE=etcd-backup-$(date +%Y%m%d%H%M%S).db  

etcdctl --endpoints=$MASTER_ENDPOINT \  

--cacert=/etc/ssl/etcd/ca.crt \  

--cert=/etc/ssl/etcd/etcd.crt \  

--key=/etc/ssl/etcd/etcd.key \  

snapshot save $BACKUP_FILE  

aws --endpoint $S3_ENDPOINT s3 cp $BACKUP_FILE s3://$BUCKET_NAME  

for f in $(aws --endpoint $S3_ENDPOINT \  

s3 ls $BUCKET_NAME | head -n "-${KEEP_LAST_BACKUP_COUNT}" \  

| awk '{print $4}'); do  

aws --endpoint $S3_ENDPOINT s3 rm s3://$BUCKET_NAME/$f  

done

如上脚本首先获取Master节点的endpoint，然后通过master endpoint创建etcd快照。快照生成后通过AWS S3命令拷贝到远端对象存储中，最后会删除一些老的备份，只保留指定数量的备份数量。

可以把如上脚本bootstrap.sh做成Docker镜像：

FROM python:alpine  

ARG ETCD_VERSION=v3.4.3  

RUN apk add --update --no-cache ca-certificates tzdata openssl  

RUN wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz \  

&& tar xzf etcd-${ETCD_VERSION}-linux-amd64.tar.gz \  

&& mv etcd-${ETCD_VERSION}-linux-amd64/etcdctl /usr/local/bin/etcdctl \  

&& rm -rf etcd-${ETCD_VERSION}-linux-amd64*  

RUN pip3 install awscli  

ENV ETCDCTL_API=3  

ADD bootstrap.sh /  

RUN chmod +x /bootstrap.sh  

CMD ["/bootstrap.sh"]

把etcd的证书以及Minio的AKSK存储到Kubernetes Secret中：

#!/bin/bash  

kubectl create secret generic etcd-tls -o yaml \  

--from-file /etc/kubernetes/pki/etcd/ca.crt \  

--from-file /etc/kubernetes/pki/etcd/server.crt \  

--from-file /etc/kubernetes/pki/etcd/server.key \  

| sed 's/server/etcd/g'  

kubectl create secret generic s3-credentials \  

-o yaml --from-file ~/.aws/credentials

通过Kubernetes自带内置的CronJob实现定时备份：

apiVersion: batch/v1beta1  

kind: CronJob  

metadata:  

name: etcd-backup  

namespace: etcd-backup  

spec:  

jobTemplate:  

metadata:  

  name: etcd-backup  

spec:  

  template:  

    spec:  

      containers:  

      - image: etcd-backup:v3.4.3  

        imagePullPolicy: IfNotPresent  

        name: etcd-backup  

        volumeMounts:  

        - name: s3-credentials  

          mountPath: /root/.aws  

        - name: etcd-tls  

          mountPath: /etc/ssl/etcd  

        - name: localtime  

          mountPath: /etc/localtime  

          readOnly: true  

        env:  

        - name: ETCD_ENDPOINTS  

          value: "192.168.1.1:2379,192.168.1.2:2379,192.168.1.3:2379"  

        - name: BUCKET_NAME  

          value: etcd-backup  

        - name: S3_ENDPOINT  

          value: "http://192.168.1.53"  

        - name: KEEP_LAST_BACKUP_COUNT  

          value: "7"  

      volumes:  

      - name: s3-credentials  

        secret:  

          secretName: s3-credentials  

      - name: etcd-tls  

        secret:  

          secretName: etcd-tls  

      - name: localtime  

        hostPath:  

          path: /etc/localtime  

      restartPolicy: OnFailure  

schedule: '0 0 * * *'

如上CronJob配置每天0点对etcd进行备份到Minio对象存储中。
本文首先介绍了Kubernetes备份的思路以及开源restic工具。然后介绍了使用开源Velero工具实现Kubernetes应用级别备份容灾方案，重点介绍了PV卷业务数据的备份和恢复过程。最后介绍了通过etcd备份实现Kubernetes集群级别的备份容灾。

原文链接：https://mp.weixin.qq.com/s/nHkXzzAR8Rilf-eRSTJxoQ

使用开源工具实现Kubernetes备份容灾

使用开源工具实现Kubernetes备份容灾

restic简介

restic配置以及仓库初始化

文件检索以及查看文件内容

使用开源Velero工具实现Kubernetes应用备份容灾

Velero配置

通过Velero执行备份

备份数据管理以及迁移

备份策略与计划

Kubernetes企业备份容灾方案

Kubernetes集群备份

Recommend

当心弃货！不付款！！这些国家货币暴跌！注意控制贸易风险！

Shopee店铺扣分户：明明可以拼实力，却要靠店铺权重吃饭

Corsair introduces Dominator DDR5 Platinum RGB memory clocked up to 6,400 MT/s |...

云南省多部门执法惩治比特币「挖矿」，全年可节电约20亿千瓦时

Instagram负责人：我们正在积极探索NFT

海运集装箱不能装哪些货物？

路透社：波音公司计划在元宇宙中打造虚拟「数字孪生」飞机

PostGIS — Olivier Courtin Farewell

跨境独立站如何搭建？Wordpress建站实操：wordpress的安装

TikTok 要进军“虚拟餐厅”界？MrBeast Burger 的成功路径能被复刻吗？

About Joyk