Backup is something that every Internet company technician has to do, and of course we are no exception. Today I mainly formulate some of my own strategies for the production environment kubernetes cluster, and share them with you here.

The purpose of my kubernetes backup here is mainly to prevent the following situations:

  • Prevent accidental deletion of a namespace in the cluster
  • Prevent accidental operation from causing an abnormality in a resource in the cluster, such as deployment, configmap, etc.
  • Prevent accidental deletion of some resource objects in the cluster
  • Prevent etcd data loss

Backup ETCD

Back up etcd to prevent the k8s cluster from having a cluster-level failure or etcd data loss, which will cause the entire cluster to be unavailable. In this case, the business can only be restored by restoring the cluster.

The backup etcd script is as follows: shell #!/bin/bash #ENDPOINTS="https://192.168.1.207:2379,https://192.168.1.208:2379,https://192.168.1.209:2379" ENDPOINTS="127.0.0.1:2379" CACERT="/etc/kubernetes/p ki/etcd/ca.crt" CERT="/etc/kubernetes/pki/etcd/server.crt" KEY="/etc/kubernetes/pki/etcd/server.key" DATE=`date +%Y%m%d-%H%M%S` BACKUP_DIR="/home/centos/hostpath/backups/k8s/etcd" ETCDCTL_API=3 /usr/local/bin/et cdctl --cacert=${CACERT} --cert=${CERT} --key=${KEY} --endpoints="${ENDPOINTS}" snapshot save ${BACKUP_DIR}/k8s-snapshot-${DATE}.db find $BACKUP_DIR/ -type f -mtime +20 -exec rm -f {} \; cron task plan: ``yaml 50 21 * * * /bin/b ash /home/centos/hostpath/backups/k8s/etcdv3-bak.sh ``` ## Minio object storage service setup

Since our storage cluster is built with GlusterFS, I can only use minio to build object storage here, with the underlying GlusterFS file system; if you use Alibaba Cloud OSS to back up your cluster resources, please ignore this step and move to: https://github.com/AliyunContainerService/velero-plugin

Since minio needs to provide corresponding storage pv/pvc when built in the k8s environment, for simplicity, we will directly start the docker method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
version: '2.0'
services:
minio:
image: minio/minio:latest
container_name: minio
ports:
- "39000:9000"
- "39001:9001"
restart: always
command: server --console-address ':9001' /data
environment:
MINIO_ACCESS_KEY: admin
MINIO_SECRET_KEY: adminSD#123
logging:
options:
max-size: "1000M" # Maximum file upload limit
max-file: "100"
driver: json-file
volumes:
- /home/centos/hostpath/backups/k8s/velero:/data # Mapping file path
networks:
- minio

networks:
minio:
ipam:
config:
- subnet: 10.210.1.0/24
gateway: 10.210.1.1

Open the browser and enter the following address and account information, then you can manage minio object storage through the web console

1
2
3
minio web: http://192.168.1.214:39001
minio admin: admin
minio admin passwd: adminSD#123

Install the velero backup client

1
brew install velero

Create the credentials-velero file with the following content, which is convenient for creating a velero server-side connection object storage later:

1
2
3
[default]
aws_access_key_id = admin
aws_secret_access_key = adminSD#123

velero backup k8s deployment velero

1
2
3
4
5
6
7
# velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.0 \
--bucket k8s-jf \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000

velero backup velero command

backup, view, delete operations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#Backup resources under the cluster ingress-nginx namespace:
velero backup create ingress-nginx-backup --include-namespaces ingress-nginx

#View backup results
velero backup describe ingress-nginx-backup
velero backup logs ingress-nginx-backup

#Delete backup
velero delete backup ingress-nginx-backup

#Backup resources under non-ingress-nginx and test namespaces:
velero backup create k8s-full-test-backup --exclude-namespaces ingress-nginx,test

#Backup specific resource types
velero backup create kube-system-backup --include-resources pod,secret

#--confirm Directly delete the backup without confirmation:
velero backup delete kube-system-backup --confirm

#Backup with pv pod
velero backup create pvc-backup --snapshot-volumes --include-namespaces test-velero

Note: –include-resources specifies the resource types to back up, and –exclude-resources specifies to exclude certain resource types

Scheduled backup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Back up every six hours
velero create schedule ${SCHEDULE_NAME} --schedule="0 */6 * * *"

# Back up every six hours using every
velero create schedule ${SCHEDULE_NAME} --schedule="@every 6h"

# Create a daily backup of the web namespace
velero create schedule ${SCHEDULE_NAME} --schedule="@every 24h" --include-namespaces web

# Create a weekly backup with a duration of 90 days.
velero create schedule ${SCHEDULE_NAME} --schedule="@every 168h" --ttl 2160h0m0s

Description: –ttl can specify the life cycle of the backup. After the ttl times out, the backup will be cleaned up regularly. The default ttl is 30 days

Backup and restore:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#Create restore from backup
velero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}

# Create restore from backup. The default name of restore is ${BACKUP_NAME}-<timestamp>
velero restore create --from-backup ${BACKUP_NAME}

# Create restore from the latest backup of schedule
velero restore create --from-schedule ${SCHEDULE_NAME}

# Specify some resources in backup to create restore
velero restore create --from-backup backup-2 --include-resources pod,secret

# Restore all backups of the cluster (existing services will not be overwritten)
velero restore create --from-backup all-ns-backup

# Restore only the default nginx-example namespace
velero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example

# Restore the test-velero namespace resources to test-velero-1
velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1

View backup

1
2
3
4
velero get backup #Backup view
velero get schedule #View scheduled backup
velero get restore #View existing restores
velero get plugins #View plugins

Description:

velero restore create RESTORE_NAME –from-backup BACKUP_NAME –namespace-mappings old-ns-1:new-ns-1,old-ns-2:new-ns-2

Velero can restore resources to a different namespace than the one from which they were backed up. To do this, use the –namespace-mappings flag

Velero backup in action

Velero performs a full backup of the cluster:

1
velero backup create k8s-jf-test-all

Set a scheduled backup every 4 hours and keep the backup for 2 months:

1
2
# velero create schedule k8s-jf-cron-4h --exclude-namespaces test,tt --schedule="@every 4h" --ttl 1440h
Schedule "k8s-jf-cron-4h" created successfully.

Manually restore one of the namespaces from the full backup to the specified namespace in the same cluster

1
2
3
4
5
6
7
8
# velero restore create k8s-jf-test-all-restore --from-backup k8s-jf-test-all --include-namespaces test --namespace-mappings test:test10000 # velero restore describe k8s-jf-test-all-restore #View recovery status Name: k8s-jf-test-all-restore Namespace: velero Labels: <none> Annotations: <none> Phase: InProgress Estimated total items to be restored: 141 Items restored so far: 123 Started: 2021-09-07 10:47:44 +080 0 CST Completed: <n/a> Backup: k8s-jf-test-all Namespaces: Included: all namespaces found in the backup Excluded: <none> Resources: Included: * Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io Cluster-scoped: auto
Namespace mappings: test=test10000
Label selector: <none>
Restore PVs: auto
Preserve Service NodePorts: auto
# velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
k8s-jf-test-all-restore k8s-jf-test-all InProgress 2021-09-07 10:47:44 +0800 CST <nil> 0 0 2021-09-07 10:47:44 +0800 CST <none>

Cross-cluster, regularly back up data and restore

For velero, cross-cluster needs to keep the same cloud vendor persistent volume solution for the two clusters. Here we use minio uniformly, and buckets all use k8s-jf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
velero install \
--provider aws \ --plugins velero/velero-plugin-for-aws:v1.2.0 \ --bucket k8s-jf \ --secret-file ./credentials-velero \ --use-volume-snapshots=false \ --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000 ``` ### View backed up data: ```shell #velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR k8s-jf-all-cron-4h-20210907061716 Completed 0 0 2021-09-07 14:17:16 +0800 CST 59d default <none> k8s-jf-all-cron-4h-20210907021627 Completed 0 0 2021-09-07 10:16:27 +0800 CST 59d default <none> k8s-jf-test-all Completed 0 0 2021-09-07 10:19:45 +0800 CST 29d default <none> ```

### Restore the specified namespace data

Here, restore the argocd namespace data from the backup k8s-jf-all-cron-4h-20210907061716 to the argocd-dev of this cluster
```shell
# velero restore create --from-backup k8s-jf-all-cron-4h-20210907061716 --include-namespaces argocd --namespace-mappings argocd:argocd-dev
Restore request "k8s-jf-all-cron-4h-20210907061716-20210907155450" submitted successfully.
Run `velero restore describe k8s-jf-all-cron-4h-20210907061716-20210907155450` or `velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907155450` for more details. # velero restore get NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR k8s-jf-all-cron-4h-20210907061716-202109071 55450 k8s-jf-all-cron-4h-20210907061716 InProgress 2021-09-07 15:54:51 +0800 CST <nil> 0 0 2021-09-07 15:54:51 +0800 CST <none> # velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-application-controller-token-wv62v" logSource="pkg/restore/restore.go:1238" restore= velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 2 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-application-controller-token-wv62v namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="SecretAttempting to restore ) out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-dex-server-token-9n4rs namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time=" 2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-secret" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 4 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" =argocd-secret namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-server-token-48vjd" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 5 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-server-token-48vjd namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: cluster-192.168.1.210-3497337724" logSource="pkg/restore/re store.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 6 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=cluster-192.168.1.210-3497337724 namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09- 07T08:11:23Z" level=info msg="Attempting to restore Secret: cluster-192.168.1.214-1096681010" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 ...time="2021-09-07T08:11:30Z" level=info msg="Restored 61 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-server namespace=argocd-dev progress= resource=services restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="Waiting for all restic restores to complete" logSource="pkg/restore/restore.go:546" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="Done waiting for all restic restores to complete" logSource="pkg/restore/restore.go:562" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="Waiting for all post-restore-exec hooks to complete" logSource="pkg/restore/restore.go:566" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="Done waiting for all post-restore exec hooks to complete" logSource="pkg/restore/restore.go:574" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:480" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119

From the log, you can see that the data under the original cluster argo has been restored to the current cluster argo-dev

Uninstall velero

Uninstall velero. Note that the uninstall here will not delete the namespace:

1
2
3
4
5
6
# velero uninstall
You are about to uninstall Velero.
Are you sure you want to continue (Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero
# kubectl delete crds -l component=velero

Exception encountered during backup

1
2
time="2021-09-07T07:22:35Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114" time="2021-09-07T07:22:36Z" level=info msg="Backup storage location is invalid, marking as unavailable" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:117" time="202 1-09-07T07:22:36Z" level=error msg="Error listing backups in backup store" backupLocation=default controller=backup-sync error="rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix=backups%2F: dial tcp: look up minio.velero.svc on 10.96.0.10:53: no such host" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:361" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:182" time="2021- 09-07T07:22:36Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0, Backup storage location \"default\" is unavailable: rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix= : dial tcp: lookup minio.velero.svc on 10.96.0.10:53: no such host)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:164"
time="2021-09-07T07:22:36Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:166"

This indicates that the CRD resource BackupStorageLocation information and the object storage account are inconsistent with your own settings. The old resource information was not cleaned up due to reinstallation.

Reinstall here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# velero uninstall
You are about to uninstall Velero.
Are you sure you want to continue (Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero
# kubectl delete crds -l component=velero
# velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.0 \
--bucket k8s-jf \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000
# kubectl -n velero get backupstoragelocation default -o yaml #Viewing resource information is what you want
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
creationTimestamp: "2021-09-07T07:47:44Z" generation: 1 labels: component: velero name: default namespace: velero resourceVersion: "1184696" selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default uid: 39502e43-272e-461f-a114-a9ec9 55f0510 spec: config: region: minio s3ForcePathStyle: "true" s3Url: http://192.168.1.214:39000 default: true objectStorage: bucket: k8s-jf provider: aws status: lastSyncedTime: "2021-09-07T07:50:00Z" lastValidationTime: "2021-09-07T07:50:00Z" phase: Available ````