• Kubernetes etcd备份恢复


    Kubernetes etcd备份恢复

    所有 Kubernetes 对象都存储在 etcd 上。定期备份 etcd 集群数据对于在灾难场景(例如丢失所有控制平面节点)下恢复 Kubernetes 集群非常重要。 快照文件包含所有 Kubernetes 状态和关键信息。

    在一个基线上为etcd做快照能够实现etcd数据的备份。通过定期地为etcd节点后端数据库做快照,etcd就能从一个已知的良好状态的时间点进行恢复。运行在虚拟机的k8s集群,如果偶遇突然断电,就可能会部分文件有问题,导致etcd和apiserver起不来,这样整个集群都无法运行,因此在k8s的集群进行etcd备份十分重要,下面主要演示单master集群和多master集群2个方面。

    单master集群

    环境准备:kubeadm安装的一主三从

    1. root@k8s-01 ~]# kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. k8s-01 Ready control-plane,master 22h v1.22.3
    4. k8s-02 Ready 22h v1.22.3
    5. k8s-03 Ready 22h v1.22.3
    6. k8s-04 Ready 22h v1.22.3
    7. [root@k8s-01 ~]#
    1. 先备份etcd数据
    [root@k8s-01 kubernetes]#  ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-back/snap.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
    1. 创建测试pod
    1. [root@k8s-01 ~]# kubectl get pods
    2. NAME READY STATUS RESTARTS AGE
    3. nfs-client-provisioner-69b76b8dc6-6l8xs 1/1 Running 7 (3h55m ago) 4h43m
    4. nginx-6799fc88d8-5rqg8 1/1 Running 0 48s
    5. nginx-6799fc88d8-phvkx 1/1 Running 0 48s
    6. nginx-6799fc88d8-rwjc6 1/1 Running 0 48s
    7. [root@k8s-01 ~]#
    1. 停止etcd和apiserver

    control-plane阶段用于为API Server、Controller Manager和Scheduler生成静态Pod配置清单,而etcd阶段则为本地etcd存储生成静态Pod配置清单,它们都会保存于/etc/kubernetes/manifests目录中。当前主机上的kubelet服务会监视该目录中的配置清单的创建、变动和删除等状态变动,并根据变动完成Pod创建、更新或删除操作。因此,这两个阶段创建生成的各配置清单将会启动Master组件的相关Pod

    1. [root@k8s-01 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
    2. [root@k8s-01 kubernetes]# kubectl get pods -A
    3. The connection to the server 192.168.1.128:6443 was refused - did you specify the right host or port?
    4. [root@k8s-01 kubernetes]#

    image-20220718230447435

    1. 变更/var/lib/etcd
    1. [root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
    2. [root@k8s-01 kubernetes]#
    1. 恢复etcd数据
    [root@k8s-01 lib]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot restore /opt/etcd-back/snap.db  --data-dir=/var/lib/etcd/
    1. 启动etcd和apiserver,查看pods
    1. [root@k8s-01 lib]# cd /etc/kubernetes/
    2. [root@k8s-01 kubernetes]# mv manifests-backup manifests
    3. [root@k8s-01 kubernetes]# kubectl get pods
    4. NAME READY STATUS RESTARTS AGE
    5. nfs-client-provisioner-69b76b8dc6-6l8xs 1/1 Running 12 (2m25s ago) 4h48m
    6. [root@k8s-01 ~]# kubectl get pods -n kube-system
    7. NAME READY STATUS RESTARTS AGE
    8. calico-kube-controllers-65898446b5-t2mqq 1/1 Running 11 (16h ago) 21h
    9. calico-node-8md6b 1/1 Running 0 21h
    10. calico-node-9457b 1/1 Running 0 21h
    11. calico-node-nxs2w 1/1 Running 0 21h
    12. calico-node-p7d52 1/1 Running 0 21h
    13. coredns-7f6cbbb7b8-g84gl 1/1 Running 0 22h
    14. coredns-7f6cbbb7b8-j9q4q 1/1 Running 0 22h
    15. etcd-k8s-01 1/1 Running 0 22h
    16. kube-apiserver-k8s-01 1/1 Running 0 22h
    17. kube-controller-manager-k8s-01 1/1 Running 0 22h
    18. kube-proxy-49b8g 1/1 Running 0 22h
    19. kube-proxy-8wh5l 1/1 Running 0 22h
    20. kube-proxy-b6lqq 1/1 Running 0 22h
    21. kube-proxy-tldpv 1/1 Running 0 22h
    22. kube-scheduler-k8s-01 1/1 Running 0 22h
    23. [root@k8s-01 ~]#

    由于3个nginx是备份之后启动的,所以恢复后都不存在了。

    多master集群

    环境准备:kubeadm安装的二主二从

    1. [root@k8s-01 ~]# kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. k8s-01 Ready control-plane,master 16h v1.22.3
    4. k8s-02 Ready control-plane,master 16h v1.22.3
    5. k8s-03 Ready 16h v1.22.3
    6. k8s-04 Ready 16h v1.22.3
    7. [root@k8s-01 etcd-v3.5.4-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.123:2379,https://192.168.1.124:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
    8. 58915ab47aed1957, started, k8s-02, https://192.168.1.124:2380, https://192.168.1.124:2379, false
    9. c48307bcc0ac155e, started, k8s-01, https://192.168.1.123:2380, https://192.168.1.123:2379, false
    10. [root@k8s-01 etcd-v3.5.4-linux-amd64]#
    1. 2台master都需要备份:
    1. [root@k8s-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db
    2. [root@k8s-02 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db
    1. 创建3个测试pod
    1. [root@k8s-01 ~]# kubectl get pods
    2. NAME READY STATUS RESTARTS AGE
    3. nginx-6799fc88d8-2x6gw 1/1 Running 0 4m22s
    4. nginx-6799fc88d8-82mjz 1/1 Running 0 4m22s
    5. nginx-6799fc88d8-sbb6n 1/1 Running 0 4m22s
    6. tomcat-7d987c7694-552v2 1/1 Running 0 2m8s
    7. [root@k8s-01 ~]#
    1. 停掉Master机器的kube-apiserver和etcd
    1. [root@k8s-01 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
    2. [root@k8s-02 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
    1. 变更/var/lib/etcd
    1. [root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
    2. [root@k8s-02 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak
    1. 恢复etcd数据,etcd集群用同一份snapshot恢复;
    1. [root@k8s-01 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.123:2379 --name=k8s-01 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.123:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd
    2. [root@k8s-01 /]# scp snap-202207182330.db root@192.168.1.124:/
    3. root@192.168.1.124's password:
    4. snap-202207182330.db 100% 4780KB 45.8MB/s 00:00
    5. [root@k8s-02 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.124:2379 --name=k8s-02 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.124:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd

    6.master节点上启动etcd和apiserver,查看pods

    1. [root@k8s-01 lib]# cd /etc/kubernetes/
    2. [root@k8s-01 kubernetes]# mv manifests-backup manifests
    3. [root@k8s-02 lib]# cd /etc/kubernetes/
    4. [root@k8s-02 kubernetes]# mv manifests-backup manifests
    5. [root@k8s-01 lib]# kubectl get pods
    6. NAME READY STATUS RESTARTS AGE
    7. nginx-6799fc88d8-2x6gw 1/1 Running 0 16m
    8. nginx-6799fc88d8-82mjz 1/1 Running 0 16m
    9. nginx-6799fc88d8-sbb6n 1/1 Running 0 16m
    10. [root@k8s-01 ~]# kubectl get pods -n kube-system
    11. NAME READY STATUS RESTARTS AGE
    12. calico-kube-controllers-65898446b5-drjjj 1/1 Running 10 (16h ago) 16h
    13. calico-node-9s7p2 1/1 Running 0 16h
    14. calico-node-fnbj4 1/1 Running 0 16h
    15. calico-node-nx6q6 1/1 Running 0 16h
    16. calico-node-qcffj 1/1 Running 0 16h
    17. coredns-7f6cbbb7b8-mn9hj 1/1 Running 0 16h
    18. coredns-7f6cbbb7b8-nrwbf 1/1 Running 0 16h
    19. etcd-k8s-01 1/1 Running 1 16h
    20. etcd-k8s-02 1/1 Running 0 16h
    21. kube-apiserver-k8s-01 1/1 Running 2 (16h ago) 16h
    22. kube-apiserver-k8s-02 1/1 Running 0 16h
    23. kube-controller-manager-k8s-01 1/1 Running 2 16h
    24. kube-controller-manager-k8s-02 1/1 Running 0 16h
    25. kube-proxy-d824j 1/1 Running 0 16h
    26. kube-proxy-k5gw4 1/1 Running 0 16h
    27. kube-proxy-mxmhp 1/1 Running 0 16h
    28. kube-proxy-nvpf4 1/1 Running 0 16h
    29. kube-scheduler-k8s-01 1/1 Running 1 16h
    30. kube-scheduler-k8s-02 1/1 Running 0 16h
    31. [root@k8s-01 ~]#
  • 相关阅读:
    Android应用内组件通讯之EventBus的使用(一)
    【Pinia】Pinia的概念、优势及使用方式
    Linux:动态监控进程+监控网络状态
    ITE IT66021FN/BX HDMI 1.4接收器/接收芯片/收发器
    派对最大快乐值问题
    【电商运营】网上商店如何利用好自己的营销数据达成目标?
    从零开始学习Dubbo7——Dubbo的高级特性
    OJ练习第177题——打家劫舍 IV(二分查找)
    Spring Boot技术知识点:如何使用@Validated注解来对邮箱字段进行数据校验
    学会这一方法,轻松实现Excel批量转PDF,快来码住
  • 原文地址:https://blog.csdn.net/qq_29860591/article/details/127127424