• K8s集群的Etcd数据库的备份与还原


    一、安装etcdctl 命令行命令

    etcd不同版本的 etcdctl 命令不一样,但大致差不多,这里备份使用 napshot save进行快照备份。
    需要注意几点:
    1、备份操作在etcd集群的其中一个节点执行就可以。
    2、这里使用的是etcd v3的api,因为从 k8s 1.13 开始,k8s不再支持 v2 版本的 etcd,即k8s的集群数据都存在了v3版本的etcd中。故备份的数据也只备份了使用v3添加的etcd数据,v2添加的etcd数据是没有做备份的(下面命令中的"ETCDCTL_API=3 etcdctl" 等同于 “etcdctl”)。

    yum install -y etcd
    
    • 1

    二、Etcd数据备份及恢复

    1、数据存放

    etcd的数据默认会存放在 /var/lib/etcd/member/,我们发现数据所在的目录,会被分为两个文件夹中:

    snap: 存放快照数据,etcd防止WAL文件过多而设置的快照,存储etcd数据状态。
    wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。

    2、准备工作:

    #备份/etc/kubernetes目录
    cp  -r  /etc/kubernetes/ /etc/kubernetes_bak/
    #备份/var/lib/etcd目录
    cp -r /var/lib/etcd/ /var/lib/etcd_bak/ 
    #备份 /var/lib/kubelet目录
    cp -r /var/lib/kubelet/ /var/lib/kubelet_bak/ 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    三、单节点etcd数据备份和恢复

    这种方式的备份和恢复,用基于文件的备份即可。Kubeadm的默认安装时,将etcd的存储数据落地到了宿主机的/var/lib/etcd/目录,将此目录下的文件定期备份起来,如果以后etcd的数据出现问题,需要恢复时,直接将文件还原到此目录下,就实现了单节点的etcd数据恢复

    注:如果etcd容器正在启动,是不能覆盖的,这时只需要将/etc/kubernetes/manifests文件夹重命名,数据文件替换后,将/etc/kubernetes/manifests改回来,过一会就会自动将etcd容器重启起来,可参见:四、Kubeadm安装的单master集群)

    3.1、Kubeadm安装的单master集群

    1、备份

    V3版api:备份ETCDCTL_API为3的etcd数据到当前的备份目录下。

    ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot save ./snap-$(date +%Y%m%d%H%M).db
    
    • 1

    备注: 1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API。 2)endponits可以通过下面的命令查找,一般会有两个IP,一个是127.0.0.1,另外一个本机的局域网IP,如: [root@app01 ~]# kubectl describe pod etcd-app01 -n kube-system| grep listen-client-urls --listen-client-urls=https://127.0.0.1:2379,https://192.168.180.45:2379

    2、恢复

    1、先暂停kube-apiserver和etcd容器

    mv /etc/kubernetes/manifests /etc/kubernetes/manifests_bak
    rm -rf /var/lib/etcd
    
    • 1
    • 2

    2、恢复

    ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot restore snap-202208251559.db --data-dir=/var/lib/etcd
    
    • 1

    3、启动kube-apiserver和etcd容器

    mv /etc/kubernetes/manifests.bak /etc/kubernetes/manifests
    
    • 1

    4、查看pod是否恢复正常了

    kubectl get pod -n kube-system
    
    • 1

    3.2、二进制Etcd数据库的备份还原(未验证)

    1、备份

    V3版api:

    ETCDCTL_API=3  etcdctl snapshot save snap.20220107.db --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://192.168.119.72:2379"
    
    
    {"level":"info","ts":1630499882.9289303,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"snap.db.part"}
    {"level":"info","ts":"2022-01-07T20:38:02.933+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
    {"level":"info","ts":1630499882.933808,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.119.72:2379"}
    {"level":"info","ts":"2022-01-07T20:38:03.040+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
    {"level":"info","ts":1630499883.0697453,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.119.72:2379","size":"13 MB","took":0.140736973}
    {"level":"info","ts":1630499883.0698237,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"snap.db"}
    Snapshot saved at snap.20220107.db
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    2、还原

    下面的二进制etcd集群数据库的还原操作没有在二进制集群实际验证,只是理论步骤,请勿在生产环境直接操作!

    systemctl stop kube-apiserver
    systemctl stop etcd
    mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
    
    • 1
    • 2
    • 3
    --->如果不知道二进制集群的etcd数据库放在哪里了,可以这样查看
    
    systemctl cat etcd.service
     
    ETCDCTL_API=3 etcdctl snapshot restore /data/backup/snap.20220107.db --data-dir=/var/lib/etcd/default.etcd
    
    chown -R etcd:etcd /var/lib/etcd
    systemctl start kube-apiserver
    systemctl start etcd.service
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    四、etcd集群数据的备份和恢复(未验证)

    4.1、Kubeadm安装的多master集群

    1、备份

    V3版api:
    备份ETCDCTL_API为3的etcd数据到之前的备份目录下。可以在一个master节点上执行备份操作

    ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot save /backup_$(date +%Y%m%d)/snap-$(date +%Y%m%d%H%M).db
    
    • 1

    2、恢复

    1、停掉所有Master节点的kube-apiserver和etcd,需要分别在master1、master2、master3上进行同样的操作

    mv /etc/kubernetes/manifests  /etc/kubernetes/manifests_bak
    rm -rf  /var/lib/etcd
    
    • 1
    • 2

    2、在master1上执行

    ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
        --endpoints=192.168.100.171:2379 \
        --name=master1 \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --initial-advertise-peer-urls=https://192.168.100.171:2380 \
        --initial-cluster-token=etcd-cluster-0 \
        --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
        --data-dir=/var/lib/etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    在master2上执行

    ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
        --endpoints=192.168.100.172:2379 \
        --name=master2 \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --initial-advertise-peer-urls=https://192.168.100.172:2380 \
        --initial-cluster-token=etcd-cluster-0 \
        --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
        --data-dir=/var/lib/etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    在master3上执行

    ETCDCTL_API=3
    etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
        --endpoints=192.168.100.173:2379 \
        --name=master3 \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --initial-advertise-peer-urls=https://192.168.100.173:2380 \
        --initial-cluster-token=etcd-cluster-0 \
        --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
        --data-dir=/var/lib/etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    备注:
    1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API;
    2)如果不知道 --name= 则可以用如下命令查看

    集群列出成员

    ETCDCTL_API=3 etcdctl --endpoints 192.168.100.171:2379,192.168.100.172:2379,192.168.100.173:2379 --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list --write-out=table
    
    返回结果:
    
    +------------------+---------+---------+------------------------------+------------------------------+------------+
    |        ID        | STATUS  |  NAME   |          PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
    +------------------+---------+---------+------------------------------+------------------------------+------------+
    | 442ee8f1d97e7dcd | started | master3 | https://192.168.100.173:2380 | https://192.168.100.173:2379 |      false |
    | 4972579f39eb9468 | started | master1 | https://192.168.100.171:2380 | https://192.168.100.171:2379 |      false |
    | 4bff6a42b677cc19 | started | master2 | https://192.168.100.172:2380 | https://192.168.100.172:2379 |      false |
    +------------------+---------+---------+------------------------------+------------------------------+------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    3、在三台master节点上恢复manifests

    mv /etc/kubernetes/manifests_bak  /etc/kubernetes/manifests
    
    • 1

    4.2 二进制部署方式安装的多etcd节点集群(未验证)

    1、备份

    ETCDCTL_API=3 etcdctl \
    snapshot save snap.db \
    --endpoints=https://192.168.10.160:2379 \
    --cacert=/opt/etcd/ssl/ca.pem \
    --cert=/opt/etcd/ssl/server.pem \
    --key=/opt/etcd/ssl/server-key.pem
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    2、恢复

    1、先暂停kube-apiserver和etcd

    systemctl stop kube-apiserver
    systemctl stop etcd etcd
    mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
    
    • 1
    • 2
    • 3

    2、在每个节点上恢复

    ETCDCTL_API=3 etcdctl snapshot restore snap.db \
    --name etcd-1 \
    --initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
    --initial-advertise-peer-url=https://192.168.10.160:2380 \
    --data-dir=/var/lib/etcd/default.etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5
    ETCDCTL_API=3 etcdctl snapshot restore snap.db \
    --name etcd-2 \
    --initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
    --initial-advertise-peer-url=https://192.168.10.162:2380 \
    --data-dir=/var/lib/etcd/default.etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5
    ETCDCTL_API=3 etcdctl snapshot restore snap.db \
    --name etcd-3 \
    --initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
    --initial-advertise-peer-url=https://192.168.10.162:2380 \
    --data-dir=/var/lib/etcd/default.etcd
    
    • 1
    • 2
    • 3
    • 4
    • 5

    3、启动kube-apiserver和etcd

    mv /var/lib/etcd/default.etcd.bak /var/lib/etcd/default.etcd
    systemctl start kube-apiserver
    systemctl start etcd.service
    
    • 1
    • 2
    • 3
  • 相关阅读:
    OpenAI超级对齐负责人:“驾驭”超级智能的四年计划
    使用 compose 的 Canvas 自定义绘制实现 LCD 显示数字效果
    1300*D. Alice, Bob and Candies(模拟)
    Springboot集成Quartz
    快速理解DDD领域驱动设计架构思想-基础篇 | 京东物流技术团队
    使用USB转JTAG芯片CH347在Vivado下调试
    Eigen库中MatrixXd类型与VectorXd类型的相互映射与数据复制
    15、Java 多态的详细介绍(参考官方教程)
    Ceph RGW误删index对象恢复
    day9-操作系统初始化函数init-2
  • 原文地址:https://blog.csdn.net/lihongbao80/article/details/126508726