Kubernetes etcd备份恢复
所有 Kubernetes 对象都存储在 etcd 上。定期备份 etcd 集群数据对于在灾难场景(例如丢失所有控制平面节点)下恢复 Kubernetes 集群非常重要。 快照文件包含所有 Kubernetes 状态和关键信息。
在一个基线上为etcd做快照能够实现etcd数据的备份。通过定期地为etcd节点后端数据库做快照,etcd就能从一个已知的良好状态的时间点进行恢复。运行在虚拟机的k8s集群,如果偶遇突然断电,就可能会部分文件有问题,导致etcd和apiserver起不来,这样整个集群都无法运行,因此在k8s的集群进行etcd备份十分重要,下面主要演示单master集群和多master集群2个方面。
单master集群
环境准备:kubeadm安装的一主三从
- root@k8s-01 ~]# kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- k8s-01 Ready control-plane,master 22h v1.22.3
- k8s-02 Ready
22h v1.22.3 - k8s-03 Ready
22h v1.22.3 - k8s-04 Ready
22h v1.22.3 - [root@k8s-01 ~]#
- 先备份etcd数据
[root@k8s-01 kubernetes]# ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-back/snap.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
- 创建测试pod
- [root@k8s-01 ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nfs-client-provisioner-69b76b8dc6-6l8xs 1/1 Running 7 (3h55m ago) 4h43m
- nginx-6799fc88d8-5rqg8 1/1 Running 0 48s
- nginx-6799fc88d8-phvkx 1/1 Running 0 48s
- nginx-6799fc88d8-rwjc6 1/1 Running 0 48s
- [root@k8s-01 ~]#
- 停止etcd和apiserver
control-plane阶段用于为API Server、Controller Manager和Scheduler生成静态Pod配置清单,而etcd阶段则为本地etcd存储生成静态Pod配置清单,它们都会保存于/etc/kubernetes/manifests目录中。当前主机上的kubelet服务会监视该目录中的配置清单的创建、变动和删除等状态变动,并根据变动完成Pod创建、更新或删除操作。因此,这两个阶段创建生成的各配置清单将会启动Master组件的相关Pod
- [root@k8s-01 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
- [root@k8s-01 kubernetes]# kubectl get pods -A
- The connection to the server 192.168.1.128:6443 was refused - did you specify the right host or port?
- [root@k8s-01 kubernetes]#
- 变更/var/lib/etcd
- [root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
- [root@k8s-01 kubernetes]#
- 恢复etcd数据
[root@k8s-01 lib]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot restore /opt/etcd-back/snap.db --data-dir=/var/lib/etcd/
- 启动etcd和apiserver,查看pods
- [root@k8s-01 lib]# cd /etc/kubernetes/
- [root@k8s-01 kubernetes]# mv manifests-backup manifests
- [root@k8s-01 kubernetes]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nfs-client-provisioner-69b76b8dc6-6l8xs 1/1 Running 12 (2m25s ago) 4h48m
- [root@k8s-01 ~]# kubectl get pods -n kube-system
- NAME READY STATUS RESTARTS AGE
- calico-kube-controllers-65898446b5-t2mqq 1/1 Running 11 (16h ago) 21h
- calico-node-8md6b 1/1 Running 0 21h
- calico-node-9457b 1/1 Running 0 21h
- calico-node-nxs2w 1/1 Running 0 21h
- calico-node-p7d52 1/1 Running 0 21h
- coredns-7f6cbbb7b8-g84gl 1/1 Running 0 22h
- coredns-7f6cbbb7b8-j9q4q 1/1 Running 0 22h
- etcd-k8s-01 1/1 Running 0 22h
- kube-apiserver-k8s-01 1/1 Running 0 22h
- kube-controller-manager-k8s-01 1/1 Running 0 22h
- kube-proxy-49b8g 1/1 Running 0 22h
- kube-proxy-8wh5l 1/1 Running 0 22h
- kube-proxy-b6lqq 1/1 Running 0 22h
- kube-proxy-tldpv 1/1 Running 0 22h
- kube-scheduler-k8s-01 1/1 Running 0 22h
- [root@k8s-01 ~]#
由于3个nginx是备份之后启动的,所以恢复后都不存在了。
多master集群
环境准备:kubeadm安装的二主二从
- [root@k8s-01 ~]# kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- k8s-01 Ready control-plane,master 16h v1.22.3
- k8s-02 Ready control-plane,master 16h v1.22.3
- k8s-03 Ready
16h v1.22.3 - k8s-04 Ready
16h v1.22.3 - [root@k8s-01 etcd-v3.5.4-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.123:2379,https://192.168.1.124:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
- 58915ab47aed1957, started, k8s-02, https://192.168.1.124:2380, https://192.168.1.124:2379, false
- c48307bcc0ac155e, started, k8s-01, https://192.168.1.123:2380, https://192.168.1.123:2379, false
- [root@k8s-01 etcd-v3.5.4-linux-amd64]#
- 2台master都需要备份:
- [root@k8s-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db
- [root@k8s-02 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db
- 创建3个测试pod
- [root@k8s-01 ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-6799fc88d8-2x6gw 1/1 Running 0 4m22s
- nginx-6799fc88d8-82mjz 1/1 Running 0 4m22s
- nginx-6799fc88d8-sbb6n 1/1 Running 0 4m22s
- tomcat-7d987c7694-552v2 1/1 Running 0 2m8s
- [root@k8s-01 ~]#
- 停掉Master机器的kube-apiserver和etcd
- [root@k8s-01 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
- [root@k8s-02 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
- 变更/var/lib/etcd
- [root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
- [root@k8s-02 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
- 恢复etcd数据,etcd集群用同一份snapshot恢复;
- [root@k8s-01 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.123:2379 --name=k8s-01 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.123:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd
- [root@k8s-01 /]# scp snap-202207182330.db root@192.168.1.124:/
- root@192.168.1.124's password:
- snap-202207182330.db 100% 4780KB 45.8MB/s 00:00
- [root@k8s-02 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.124:2379 --name=k8s-02 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.124:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd
6.master节点上启动etcd和apiserver,查看pods
- [root@k8s-01 lib]# cd /etc/kubernetes/
- [root@k8s-01 kubernetes]# mv manifests-backup manifests
- [root@k8s-02 lib]# cd /etc/kubernetes/
- [root@k8s-02 kubernetes]# mv manifests-backup manifests
- [root@k8s-01 lib]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-6799fc88d8-2x6gw 1/1 Running 0 16m
- nginx-6799fc88d8-82mjz 1/1 Running 0 16m
- nginx-6799fc88d8-sbb6n 1/1 Running 0 16m
- [root@k8s-01 ~]# kubectl get pods -n kube-system
- NAME READY STATUS RESTARTS AGE
- calico-kube-controllers-65898446b5-drjjj 1/1 Running 10 (16h ago) 16h
- calico-node-9s7p2 1/1 Running 0 16h
- calico-node-fnbj4 1/1 Running 0 16h
- calico-node-nx6q6 1/1 Running 0 16h
- calico-node-qcffj 1/1 Running 0 16h
- coredns-7f6cbbb7b8-mn9hj 1/1 Running 0 16h
- coredns-7f6cbbb7b8-nrwbf 1/1 Running 0 16h
- etcd-k8s-01 1/1 Running 1 16h
- etcd-k8s-02 1/1 Running 0 16h
- kube-apiserver-k8s-01 1/1 Running 2 (16h ago) 16h
- kube-apiserver-k8s-02 1/1 Running 0 16h
- kube-controller-manager-k8s-01 1/1 Running 2 16h
- kube-controller-manager-k8s-02 1/1 Running 0 16h
- kube-proxy-d824j 1/1 Running 0 16h
- kube-proxy-k5gw4 1/1 Running 0 16h
- kube-proxy-mxmhp 1/1 Running 0 16h
- kube-proxy-nvpf4 1/1 Running 0 16h
- kube-scheduler-k8s-01 1/1 Running 1 16h
- kube-scheduler-k8s-02 1/1 Running 0 16h
- [root@k8s-01 ~]#