• ETCD 备份与还原


    安装etcdctl

    准备看下etcd如何命令行操作,才发现,主机上,只用kubeadm拉起了etcd,但没有etcdctl命令。

    1. # sudo docker ps -a | awk '/etcd-master/{print $1}'
    2. c4e3a57f05d7
    3. 26a11608b270
    4. 836dabc8e254

     找到正在运行的etcd,将pod中的etcdctl命令复制到主机上,使得在主机上就能直接使用etcdctl命令。

    # sudo docker cp c4e3a57f05d7:/usr/local/bin/etcdctl /usr/local/bin/etcdctl
    

    在此执行etcdctl 命令,已成功执行

    1. # etcdctl
    2. NAME:
    3.         etcdctl - A simple command line client for etcd3.
    4. USAGE:
    5.         etcdctl [flags]
    6. VERSION:
    7.         3.5.1
    8. API VERSION:
    9.         3.5

    查看etcd节点成员

    1. # etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key member list -w table
    2. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+
    3. | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
    4. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+
    5. | dd7a929be676b37 | started | | https://192.168.1.120:2380 | https://192.168.1.120:2379 | false |
    6. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+

    etcd的证书列表

    1. # ll /etc/kubernetes/pki/etcd/
    2. total 32
    3. -rw-r----- 1 root root 1086 Mar 26 16:52 ca.crt
    4. -rw------- 1 root root 1675 Mar 26 16:52 ca.key
    5. -rw-r----- 1 root root 1159 Mar 26 16:52 healthcheck-client.crt
    6. -rw------- 1 root root 1675 Mar 26 16:52 healthcheck-client.key
    7. -rw-r----- 1 root root 1220 Mar 26 16:52 peer.crt
    8. -rw------- 1 root root 1675 Mar 26 16:52 peer.key
    9. -rw-r----- 1 root root 1220 Mar 26 16:52 server.crt
    10. -rw------- 1 root root 1675 Mar 26 16:52 server.key

    etcdctl设置别名

    1. # alias etcdctl='etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key'
    2. [root@192.168.1.120 ~]#
    3. [root@192.168.1.120 ~]# etcdctl member list -w table
    4. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+
    5. | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
    6. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+
    7. | dd7a929be676b37 | started | 192.168.1.120 | https://192.168.1.120:2380 | https://192.168.1.120:2379 | false |
    8. +-----------------+---------+-------------------+---------------------------+---------------------------+------------+

    查看etcd的详情

    IS LEADER:当前是主节点

    RAFT TERM: 做了多少轮的选举

    1. # etcdctl endpoint status -w table
    2. +--------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    3. | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    4. +--------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    5. | https://[127.0.0.1]:2379 | dd7a929be676b37 | 3.5.1 | 18 MB | true | false | 22 | 7579742 | 7579742 | |
    6. +--------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

    查看etcd是否健康

    1. # etcdctl endpoint health
    2. https://[127.0.0.1]:2379 is healthy: successfully committed proposal: took = 27.76824ms

    操作表小试牛刀

    etcd中的数据,存放的目录从 /  开始

    1. # etcdctl put /skywell/byd bus
    2. OK
    3. # etcdctl get /skywell/ --prefix=true
    4. /skywell/byd
    5. bus

    查看表中数据

    --keys-only: 只查看key,不看value

    --limit:  表中有很多,限制只查询几条

    1. # etcdctl get / --prefix=true --keys-only --limit 10
    2. /registry/apiregistration.k8s.io/apiservices/v1.
    3. /registry/apiregistration.k8s.io/apiservices/v1.admissionregistration.k8s.io
    4. /registry/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io
    5. /registry/apiregistration.k8s.io/apiservices/v1.apps
    6. /registry/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io
    7. /registry/apiregistration.k8s.io/apiservices/v1.authorization.k8s.io
    8. /registry/apiregistration.k8s.io/apiservices/v1.autoscaling
    9. /registry/apiregistration.k8s.io/apiservices/v1.batch
    10. /registry/apiregistration.k8s.io/apiservices/v1.certificates.k8s.io
    11. /registry/apiregistration.k8s.io/apiservices/v1.coordination.k8s.io
    12. # etcdctl get /skywell --prefix=true --keys-only --limit 10
    13. /skywell/byd

    etcd数据备份

    1. # etcdctl snapshot save etcdbackup.db
    2. {"level":"info","ts":1716971751.0047052,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"etcdbackup.db.part"}
    3. {"level":"info","ts":1716971751.028518,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
    4. {"level":"info","ts":1716971751.0286477,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://[127.0.0.1]:2379"}
    5. {"level":"info","ts":1716971751.3699682,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
    6. {"level":"info","ts":1716971751.8124714,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://[127.0.0.1]:2379","size":"18 MB","took":"now"}
    7. {"level":"info","ts":1716971751.8127532,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"etcdbackup.db"}
    8. Snapshot saved at etcdbackup.db

    验证备份数据

    1. # etcdctl --write-out=table snapshot status etcdbackup.db
    2. Deprecated: Use `etcdutl snapshot status` instead.
    3. +----------+----------+------------+------------+
    4. | HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
    5. +----------+----------+------------+------------+
    6. | b91b2b0e | 6454813 | 947 | 18 MB |
    7. +----------+----------+------------+------------+

    此时,我们删除测试的nginx的deployment

    1. # kubectl get deployment -A
    2. NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
    3. default nginx 3/3 3 3 151m
    4. ingress-nginx nginx-deployment 1/1 1 1 15d
    5. ingress-nginx nginx-ingress-controller 1/1 1 1 15d
    6. kube-system coredns 2/2 2 2 63d
    7. kube-system metrics-server 1/1 1 1 56d
    8. kubernetes-dashboard dashboard-metrics-scraper 1/1 1 1 56d
    9. kubernetes-dashboard kubernetes-dashboard 1/1 1 1 56d
    10. # kubectl delete deployment -n default nginx
    11. deployment.apps "nginx" deleted
    12. # kubectl get deployment -A
    13. NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
    14. ingress-nginx nginx-deployment 1/1 1 1 15d
    15. ingress-nginx nginx-ingress-controller 1/1 1 1 15d
    16. kube-system coredns 2/2 2 2 63d
    17. kube-system metrics-server 1/1 1 1 56d
    18. kubernetes-dashboard dashboard-metrics-scraper 1/1 1 1 56d
    19. kubernetes-dashboard kubernetes-dashboard 1/1 1 1 56d

    将备份恢复到集群

    将备份的数据还原到  --data-dir    指定的目录

    1. # etcdctl snapshot restore etcdbackup.db --data-dir=/data/foot/etcdtest/restore
    2. Deprecated: Use `etcdutl snapshot restore` instead.
    3. 2024-05-29T16:44:33+08:00 info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "etcdbackup.db", "wal-dir": "/data/foot/etcdtest/restore/member/wal", "data-dir": "/data/foot/etcdtest/restore", "snap-dir": "/data/foot/etcdtest/restore/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
    4. 2024-05-29T16:44:33+08:00 info membership/store.go:141 Trimming membership information from the backend...
    5. 2024-05-29T16:44:34+08:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
    6. 2024-05-29T16:44:34+08:00 info snapshot/v3_snapshot.go:272 restored snapshot {"path": "etcdbackup.db", "wal-dir": "/data/foot/etcdtest/restore/member/wal", "data-dir": "/data/foot/etcdtest/restore", "snap-dir": "/data/foot/etcdtest/restore/member/snap"}

    在指定的位置,重新生成了新的etcd数据

    1. # ll restore/member/
    2. total 8
    3. drwx------ 2 root root 4096 May 29 16:44 snap
    4. drwx------ 2 root root 4096 May 29 16:44 wal
    5. # ll /var/lib/etcd/member/
    6. total 0
    7. drwx------ 2 root root 246 May 29 15:08 snap
    8. drwx------ 2 root root 244 May 29 09:14 wal

    现在,需要停止所有的  kubernetes 组件以更新  etcd  数据。

    将/etc/kubernetes/manifests/kubernetes中的组件清单文件, 将此文件移除。

    1. # ll /etc/kubernetes/manifests/
    2. total 16
    3. -rw------- 1 root root 2260 Mar 26 16:52 etcd.yaml
    4. -rw------- 1 root root 3367 Mar 26 16:52 kube-apiserver.yaml
    5. -rw------- 1 root root 2878 Mar 26 16:52 kube-controller-manager.yaml
    6. -rw------- 1 root root 1464 Mar 26 16:52 kube-scheduler.yaml
    7. # mv /etc/kubernetes/manifetes/* /tmp

    kubelet会自动删除pod.【说会自动删除,我试了下,修改了etcd-data的目录,node节点没显示,kubectl get po -A 也没有显示,最后将移除去的yaml再放回来,k8s环境正常】

    1. # kubectl get po -A
    2. NAMESPACE NAME READY STATUS RESTARTS AGE
    3. ingress-nginx nginx-deployment-64d5f7665c-56cpz 1/1 Running 0 15d
    4. ingress-nginx nginx-ingress-controller-7cfc988f46-cszsd 1/1 Running 0 15d
    5. kube-flannel kube-flannel-ds-lpm9c 1/1 Running 0 64d
    6. kube-system coredns-6d8c4cb4d-sml87 1/1 Running 0 64d
    7. kube-system coredns-6d8c4cb4d-w4hgz 1/1 Running 0 64d
    8. kube-system etcd-master 1/1 Running 181 (18d ago) 64d
    9. kube-system kube-apiserver-master 1/1 Running 159 64d
    10. kube-system kube-controller-manager-master 1/1 Running 241 (3d7h ago) 64d
    11. kube-system kube-proxy-6ct9f 1/1 Running 0 64d
    12. kube-system kube-scheduler-master 1/1 Running 3256 (3d7h ago) 64d
    13. kube-system metrics-server-5d6946c85b-5585p 1/1 Running 0 56d
    14. kubernetes-dashboard dashboard-metrics-scraper-6f669b9c9b-hmw4b 1/1 Running 0 19d
    15. kubernetes-dashboard kubernetes-dashboard-57dd8bd998-ghrhd 1/1 Running 26 (18d ago) 19d

    当组件都已删除后,修改/etc/kubernetes/manifests/etcd.yaml中的etcd-data中hostPath路径参数

    1. volumeMounts:
    2. - mountPath: /var/lib/etcd #将这里的路径换成新的etcd的路径,刚才restore所在的目录
    3. name: etcd-data
    4. - mountPath: /etc/kubernetes/pki/etcd
    5. name: etcd-certs
    6. hostNetwork: true
    7. priorityClassName: system-node-critical
    8. securityContext:
    9. seccompProfile:
    10. type: RuntimeDefault
    11. volumes:
    12. - hostPath:
    13. path: /etc/kubernetes/pki/etcd
    14. type: DirectoryOrCreate
    15. name: etcd-certs
    16. - hostPath:
    17. path: /var/lib/etcd
    18. type: DirectoryOrCreate
    19. name: etcd-data
    20. status: {}
    21. ~

    查看k8s集群状态

    1. # kubectl get cs
    2. Warning: v1 ComponentStatus is deprecated in v1.19+
    3. NAME STATUS MESSAGE ERROR
    4. controller-manager Healthy ok
    5. scheduler Healthy ok
    6. etcd-0 Healthy {"health":"true","reason":""}

  • 相关阅读:
    鲲鹏920(ARM64)移植javacpp
    Jsonp跨域的坑,关于jsonp你真的了解吗
    什么是数据泄露?泄露途径有哪些?企业如何免遭数据泄露?
    操作系统真象还原_访问vaddr对应的pte
    大数据面试题:Spark和Flink的区别
    827万!朔黄铁路基于5G边缘计算的智慧牵引变电所研究项目
    el-upload 上传视频时,动态截取视频第一帧画面作为封面展示
    Golang Array 数组使用注意事项和细节
    探索零信任架构的基础知识
    想学设计模式、想搞架构设计,先学学UML系统建模吧您
  • 原文地址:https://blog.csdn.net/red_sky_blue/article/details/139295748