Backup is something that every Internet company technician has to do, and of course we are no exception. Today I mainly formulate some of my own strategies for the production environment kubernetes cluster, and share them with you here.
The purpose of my kubernetes backup here is mainly to prevent the following situations:
Prevent accidental deletion of a namespace in the cluster
Prevent accidental operation from causing an abnormality in a resource in the cluster, such as deployment, configmap, etc.
Prevent accidental deletion of some resource objects in the cluster
Prevent etcd data loss
Backup ETCD
Back up etcd to prevent the k8s cluster from having a cluster-level failure or etcd data loss, which will cause the entire cluster to be unavailable. In this case, the business can only be restored by restoring the cluster.
The backup etcd script is as follows: shell #!/bin/bash #ENDPOINTS="https://192.168.1.207:2379,https://192.168.1.208:2379,https://192.168.1.209:2379" ENDPOINTS="127.0.0.1:2379" CACERT="/etc/kubernetes/p ki/etcd/ca.crt" CERT="/etc/kubernetes/pki/etcd/server.crt" KEY="/etc/kubernetes/pki/etcd/server.key" DATE=`date +%Y%m%d-%H%M%S` BACKUP_DIR="/home/centos/hostpath/backups/k8s/etcd" ETCDCTL_API=3 /usr/local/bin/et cdctl --cacert=${CACERT} --cert=${CERT} --key=${KEY} --endpoints="${ENDPOINTS}" snapshot save ${BACKUP_DIR}/k8s-snapshot-${DATE}.db find $BACKUP_DIR/ -type f -mtime +20 -exec rm -f {} \; cron task plan: ``yaml 50 21 * * * /bin/b ash /home/centos/hostpath/backups/k8s/etcdv3-bak.sh ``` ## Minio object storage service setup
Since our storage cluster is built with GlusterFS, I can only use minio to build object storage here, with the underlying GlusterFS file system; if you use Alibaba Cloud OSS to back up your cluster resources, please ignore this step and move to: https://github.com/AliyunContainerService/velero-plugin
Since minio needs to provide corresponding storage pv/pvc when built in the k8s environment, for simplicity, we will directly start the docker method:
# Back up every six hoursvelero create schedule ${SCHEDULE_NAME} --schedule="0 */6 * * *"# Back up every six hours using everyvelero create schedule ${SCHEDULE_NAME} --schedule="@every 6h"# Create a daily backup of the web namespacevelero create schedule ${SCHEDULE_NAME} --schedule="@every 24h" --include-namespaces web
# Create a weekly backup with a duration of 90 days.velero create schedule ${SCHEDULE_NAME} --schedule="@every 168h" --ttl 2160h0m0s
Description: –ttl can specify the life cycle of the backup. After the ttl times out, the backup will be cleaned up regularly. The default ttl is 30 days
#Create restore from backupvelero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}# Create restore from backup. The default name of restore is ${BACKUP_NAME}-<timestamp>velero restore create --from-backup ${BACKUP_NAME}# Create restore from the latest backup of schedulevelero restore create --from-schedule ${SCHEDULE_NAME}# Specify some resources in backup to create restorevelero restore create --from-backup backup-2 --include-resources pod,secret
# Restore all backups of the cluster (existing services will not be overwritten)velero restore create --from-backup all-ns-backup
# Restore only the default nginx-example namespacevelero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example
# Restore the test-velero namespace resources to test-velero-1velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1
# velero restore create k8s-jf-test-all-restore --from-backup k8s-jf-test-all --include-namespaces test --namespace-mappings test:test10000 # velero restore describe k8s-jf-test-all-restore #View recovery status Name: k8s-jf-test-all-restore Namespace: velero Labels: <none> Annotations: <none> Phase: InProgress Estimated total items to be restored: 141 Items restored so far: 123 Started: 2021-09-07 10:47:44 +080 0 CST Completed: <n/a> Backup: k8s-jf-test-all Namespaces: Included: all namespaces found in the backup Excluded: <none> Resources: Included: * Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io Cluster-scoped: autoNamespace mappings: test=test10000
Label selector: <none>
Restore PVs: auto
Preserve Service NodePorts: auto
# velero restore getNAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
k8s-jf-test-all-restore k8s-jf-test-all InProgress 2021-09-07 10:47:44 +0800 CST <nil> 00 2021-09-07 10:47:44 +0800 CST <none>
Cross-cluster, regularly back up data and restore
For velero, cross-cluster needs to keep the same cloud vendor persistent volume solution for the two clusters. Here we use minio uniformly, and buckets all use k8s-jf
velero install \
--provider aws \ --plugins velero/velero-plugin-for-aws:v1.2.0 \ --bucket k8s-jf \ --secret-file ./credentials-velero \ --use-volume-snapshots=false\ --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000 ```### View backed up data: ```shell #velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR k8s-jf-all-cron-4h-20210907061716 Completed 0 0 2021-09-07 14:17:16 +0800 CST 59d default <none> k8s-jf-all-cron-4h-20210907021627 Completed 0 0 2021-09-07 10:16:27 +0800 CST 59d default <none> k8s-jf-test-all Completed 0 0 2021-09-07 10:19:45 +0800 CST 29d default <none> ```### Restore the specified namespace dataHere, restore the argocd namespace data from the backup k8s-jf-all-cron-4h-20210907061716 to the argocd-dev of this cluster
```shell
# velero restore create --from-backup k8s-jf-all-cron-4h-20210907061716 --include-namespaces argocd --namespace-mappings argocd:argocd-devRestore request "k8s-jf-all-cron-4h-20210907061716-20210907155450" submitted successfully.
Run `velero restore describe k8s-jf-all-cron-4h-20210907061716-20210907155450` or `velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907155450`for more details. # velero restore get NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR k8s-jf-all-cron-4h-20210907061716-202109071 55450 k8s-jf-all-cron-4h-20210907061716 InProgress 2021-09-07 15:54:51 +0800 CST <nil> 0 0 2021-09-07 15:54:51 +0800 CST <none> # velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-application-controller-token-wv62v" logSource="pkg/restore/restore.go:1238" restore= velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 2 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-application-controller-token-wv62v namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="SecretAttempting to restore ) out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-dex-server-token-9n4rs namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time=" 2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-secret" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 4 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" =argocd-secret namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-server-token-48vjd" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 5 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-server-token-48vjd namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: cluster-192.168.1.210-3497337724" logSource="pkg/restore/re store.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09-07T08:11:23Z" level=info msg="Restored 6 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=cluster-192.168.1.210-3497337724 namespace=argocd-dev progress= resource=secrets restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 time="2021-09- 07T08:11:23Z" level=info msg="Attempting to restore Secret: cluster-192.168.1.214-1096681010" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119 ...time="2021-09-07T08:11:30Z" level=info msg="Restored 61 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-server namespace=argocd-dev progress= resource=services restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119time="2021-09-07T08:11:30Z"level=info msg="Waiting for all restic restores to complete"logSource="pkg/restore/restore.go:546"restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z"level=info msg="Done waiting for all restic restores to complete"logSource="pkg/restore/restore.go:562"restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z"level=info msg="Waiting for all post-restore-exec hooks to complete"logSource="pkg/restore/restore.go:566"restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z"level=info msg="Done waiting for all post-restore exec hooks to complete"logSource="pkg/restore/restore.go:574"restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z"level=info msg="restore completed"logSource="pkg/controller/restore_controller.go:480"restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
From the log, you can see that the data under the original cluster argo has been restored to the current cluster argo-dev
Uninstall velero
Uninstall velero. Note that the uninstall here will not delete the namespace:
# velero uninstallYou are about to uninstall Velero.
Are you sure you want to continue(Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero# kubectl delete crds -l component=velero
time="2021-09-07T07:22:35Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114" time="2021-09-07T07:22:36Z" level=info msg="Backup storage location is invalid, marking as unavailable" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:117" time="202 1-09-07T07:22:36Z" level=error msg="Error listing backups in backup store" backupLocation=default controller=backup-sync error="rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix=backups%2F: dial tcp: look up minio.velero.svc on 10.96.0.10:53: no such host" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:361" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:182" time="2021- 09-07T07:22:36Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0, Backup storage location \"default\" is unavailable: rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix= : dial tcp: lookup minio.velero.svc on 10.96.0.10:53: no such host)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:164"time="2021-09-07T07:22:36Z" level=error msg="Current backup storage locations available/unavailable/unknown: 0/1/0)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:166"
This indicates that the CRD resource BackupStorageLocation information and the object storage account are inconsistent with your own settings. The old resource information was not cleaned up due to reinstallation.