• kubernetes|云原生| 如何优雅的重启和更新pod---pod生命周期管理实务


    前言:

    kubernetes的管理维护的复杂性体现在了方方面面,例如,pod的管理,服务的管理,用户的管理(RBAC),网络的管理等等,因此,kubernetes安装部署完毕仅仅是万里长征的第一步,后面的运营和维护工作才是更为关键的东西。

    那么,pod的生命周期是什么概念呢?这些和重启与更新这样的操作有着怎样的联系呢?进一步的说,什么是优雅,优雅的重启和更新有什么好处?如何做到优雅的重启和更新?

    以上问题是本文想要搞清楚的,也应该搞清楚的问题,下面就以上问题做一个尽量详细的解答,如有不对的地方,还请各位轻喷(水或者火)

    一,

    pod的生命周期

    Pod 是 Kubernetes 中最基本的工作单元,代表了一个可执行的应用程序实例。Pod 的生命周期由一系列状态组成,如下所示:

    1. Pending:表示 Pod 已经被创建,但尚未调度到任何节点上。
    2. Running:表示 Pod 已经被成功调度并正在运行。
    3. Succeeded:表示 Pod 所包含的所有容器都已经成功终止,且不会被重启。
    4. Failed:表示 Pod 所包含的至少有一个容器未能成功终止,或者 Pod 本身出现了故障。
    5. Unknown:表示无法确定 Pod 的状态,通常是由于 API 服务器无法与 Pod 进行通信。
    6. terrimer 挂起状态,表示此pod不可用,一般是删除期间的旧pod的状态

    当然了,pod的状态还有十来种,例如,outofcpu 等等这样的,但主要的常用的状态就是上面的这些。

    Pod创建:
          1. API Server 在接收到创建pod的请求之后,会根据用户提交的参数值来创建一个运行时的pod对象。
          2. 根据 API Server 请求的上下文的元数据来验证两者的 namespace 是否匹配,如果不匹配则创建失败。
          3. Namespace 匹配成功之后,会向 pod 对象注入一些系统数据,如果 pod 未提供 pod 的名字,则 API Server 会将 pod 的 uid 作为 pod 的名字。
          4. API Server 接下来会检查 pod 对象的必需字段是否为空,如果为空,创建失败。
          5. 上述准备工作完成之后会将在 etcd 中持久化这个对象,将异步调用返回结果封装成 restful.response,完成结果反馈。
          6. API Server 创建过程完成,剩下的由 scheduler 和 kubelet 来完成,此时 pod 处于 pending 状态。
          7. Scheduler选择出最优节点。
          8. Kubelet启动该Pod。
       Pod删除:
          1. 用户发出删除 pod 命令
          2. 将 pod 标记为“Terminating”状态
             监控到 pod 对象为“Terminating”状态的同时启动 pod 关闭过程
             endpoints 控制器监控到 pod 对象关闭,将pod与service匹配的 endpoints 列表中删除
             Pod执行PreStop定义的内容
          3. 宽限期(默认30秒)结束之后,若存在任何一个运行的进程,pod 会收到 SIGKILL 信号
          4. Kubelet 请求 API Server 将此 Pod 资源宽限期设置为0从而完成删除操作

    那么,pod的生命周期运行机制是比较复杂的,上面只是粗略的说了一下,底层的东西并无必要在本文详细讲解,而一个pod从创建到彻底的删除或回收我们就可以简单的认为这是一个生命周期,而在此期间pod可能会经历种种状态,并不是简单的说一个pod创建完了就等待删除,这些想法是不正确的。

    二,

    pod生命周期的管理

    由于kubernetes是一个自动化的容器管理平台,因此,我们总是希望pod被部署好后,是处于一个稳定的状态,也就是说除了running状态,其它的状态基本是不可接受的,除了部分的job类型或者init类型的pod,那么,现在的目标就很简单了,如何保持pod的状态总是running

    下面以一个实际的例子来说明问题:

    kubernetes的版本

    1. [root@node1 ~]# kubectl get no -owide
    2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    3. node1 Ready control-plane,master 117d v1.23.16 192.168.123.11 CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
    4. node2 Ready control-plane,master 117d v1.23.16 192.168.123.12 CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
    5. node3 Ready control-plane,master 117d v1.23.16 192.168.123.13 CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
    6. node4 Ready worker 117d v1.23.16 192.168.123.14 CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8

      创建了一个名为nginx的deployment,然后将它修改为两个副本,随后又修改为三个副本

    1. kubectl create deployment nginx --image=nginx:1.18
    2. kubectl scale deployment nginx --replicas=2
    3. kubectl scale deployment nginx --replicas=3

    创建一个nodeport类型的service将该后端服务发布出去,经查询,可以看到端口30353是对外端口:

    1. kubectl expose deployment nginx --type=NodePort --port=80 --target-port=80
    2. [root@node1 ~]# kubectl get svc
    3. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    4. kubernetes ClusterIP 10.96.0.1 443/TCP 117d
    5. nginx NodePort 10.96.24.248 80:30353/TCP 36s

    此时,可以利用watch命令监听此服务:

    watch curl -I http://192.168.123.11:30353/




    OK,现在一切都还是正常的,那么,现在我们更新此deployment的镜像为1.20.1,这个时候会出现什么情况呢?

    通过kubelet get events命令,可以发现,kubectl set 命令更改镜像不会对服务造成任何的影响,服务没有任何中断:

    1. Normal SuccessfulCreate replicaset/nginx-648458674d Created pod: nginx-648458674d-gldmc
    2. Normal Scheduled pod/nginx-648458674d-gldmc Successfully assigned default/nginx-648458674d-gldmc to node4
    3. Normal Pulling pod/nginx-648458674d-gldmc Pulling image "nginx:1.20.1"
    4. Normal Starting node/node4 Starting kubelet.
    5. Normal Pulled pod/nginx-648458674d-gldmc Successfully pulled image "nginx:1.20.1" in 55.174012251s (55.174015279s including waiting)
    6. Normal Created pod/nginx-648458674d-gldmc Created container nginx
    7. Normal Started pod/nginx-648458674d-gldmc Started container nginx
    8. Normal ScalingReplicaSet deployment/nginx Scaled down replica set nginx-6888c79454 to 2
    9. Normal SuccessfulDelete replicaset/nginx-6888c79454 Deleted pod: nginx-6888c79454-g24tx
    10. Normal Killing pod/nginx-6888c79454-g24tx Stopping container nginx
    11. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled up replica set nginx-648458674d to 2
    12. Normal SuccessfulCreate replicaset/nginx-648458674d Created pod: nginx-648458674d-kfgb4
    13. Normal Scheduled pod/nginx-648458674d-kfgb4 Successfully assigned default/nginx-648458674d-kfgb4 to node4
    14. Normal Pulled pod/nginx-648458674d-kfgb4 Container image "nginx:1.20.1" already present on machine
    15. Normal Created pod/nginx-648458674d-kfgb4 Created container nginx
    16. Normal Started pod/nginx-648458674d-kfgb4 Started container nginx
    17. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled down replica set nginx-6888c79454 to 1
    18. Normal SuccessfulDelete replicaset/nginx-6888c79454 Deleted pod: nginx-6888c79454-dhhts
    19. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled up replica set nginx-648458674d to 3
    20. Normal Killing pod/nginx-6888c79454-dhhts Stopping container nginx
    21. Normal SuccessfulCreate replicaset/nginx-648458674d Created pod: nginx-648458674d-v4lwp
    22. Normal Scheduled pod/nginx-648458674d-v4lwp Successfully assigned default/nginx-648458674d-v4lwp to node4
    23. Normal Pulled pod/nginx-648458674d-v4lwp Container image "nginx:1.20.1" already present on machine
    24. Normal Created pod/nginx-648458674d-v4lwp Created container nginx
    25. Normal Started pod/nginx-648458674d-v4lwp Started container nginx
    26. Normal Killing pod/nginx-6888c79454-dhhts Stopping container nginx
    27. Warning FailedKillPod pod/nginx-6888c79454-dhhts error killing pod: failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: No such container: 0c27aa115f96cbc5d713a2d508310d20035021046b59878ffc50bb2bd6ee9271"
    28. Normal SuccessfulDelete replicaset/nginx-6888c79454 Deleted pod: nginx-6888c79454-gcw24
    29. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled down replica set nginx-6888c79454 to 0
    30. Normal Killing pod/nginx-6888c79454-gcw24 Stopping container nginx
    31. Normal Starting node/node4 Starting kubelet.
    32. Normal Scheduled pod/nginx-6888c79454-tlczp Successfully assigned default/nginx-6888c79454-tlczp to node4
    33. Normal SuccessfulCreate replicaset/nginx-6888c79454 Created pod: nginx-6888c79454-tlczp
    34. Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6888c79454 to 2
    35. Normal Starting node/node4 Starting kubelet.
    36. Normal Starting node/node4 Starting kubelet.
    37. Normal Starting node/node4 Starting kubelet.
    38. Normal Pulled pod/nginx-6888c79454-tlczp Container image "nginx:1.18" already present on machine
    39. Normal Created pod/nginx-6888c79454-tlczp Created container nginx
    40. Normal Started pod/nginx-6888c79454-tlczp Started container nginx
    41. Normal Starting node/node4 Starting kubelet.
    42. Normal Starting node/node4 Starting kubelet.
    43. Normal Starting node/node4 Starting kubelet.
    44. Normal Starting node/node4 Starting kubelet.
    45. Normal Scheduled pod/nginx-6888c79454-6tfk2 Successfully assigned default/nginx-6888c79454-6tfk2 to node4
    46. Normal SuccessfulCreate replicaset/nginx-6888c79454 Created pod: nginx-6888c79454-6tfk2
    47. Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6888c79454 to 3
    48. Normal Starting node/node4 Starting kubelet.
    49. Normal Pulled pod/nginx-6888c79454-6tfk2 Container image "nginx:1.18" already present on machine
    50. Normal Created pod/nginx-6888c79454-6tfk2 Created container nginx
    51. Normal Started pod/nginx-6888c79454-6tfk2 Started container nginx
    52. Normal Starting node/node4 Starting kubelet.

    说明:以上是kubernetes的调度过程,关键的地方如下,表示scale逐步扩张新镜像的pod,缩减旧镜像的pod:

    1. (combined from similar events): Scaled up replica set nginx-648458674d to 2
    2. (combined from similar events): Scaled down replica set nginx-6888c79454 to 1
    3. (combined from similar events): Scaled up replica set nginx-648458674d to 3
    4. (combined from similar events): Scaled down replica set nginx-6888c79454 to 0

    pod的最终状态如下:

    1. [root@node1 ~]# kubectl get po -owide
    2. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    3. nginx-648458674d-gldmc 1/1 Running 0 10.244.41.18 node4
    4. nginx-648458674d-kfgb4 1/1 Running 0 10.244.41.19 node4
    5. nginx-648458674d-v4lwp 1/1 Running 0 10.244.41.20 node4

    OK,这样的更新我们可以认为是一个平滑的,优雅的更新,而如果是通过部署清单yaml文件先删除deploment,在修改文件后重新创建deployment,这样的方式无疑是简单的,粗暴的,会导致服务中断的,此时我们认为这个更新不是平滑的,粗暴的一种更新方式。

    那么,重启的话,只是省略掉修改部署清单yaml 文件这一步,同样的是粗暴的一种方式。

    具体的操作也不就演示了,大体上就是kubectl delete -f 文件  然后kubelet apply -f文件  这种形式。

    三,

    更为精细的pod版本控制 kubectl rollout

    以上介绍的两种pod管理方式,可以看出,并不是特别的精准,因为都是命令行的形式,所有更改并没有具体的体现,因此,平常的工作中,还是需要使用部署清单yaml文件的。

    那么,kubectl rollout 命令是可以满足优雅重启和更新的,下面接上面的例子说明:

    1. [root@node1 ~]# kubectl get po
    2. NAME READY STATUS RESTARTS AGE
    3. nginx-648458674d-gldmc 1/1 Running 0
    4. nginx-648458674d-kfgb4 1/1 Running 0
    5. nginx-648458674d-v4lwp 1/1 Running 0

    直接重启deployment控制器:

    1. [root@node1 ~]# kubectl rollout restart deployment nginx
    2. deployment.apps/nginx restarted

     

    查看events:

    命令:

    kubectl get events -w

    部分输出: 

    1. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled up replica set nginx-5fc8f974d9 to 1
    2. Normal SuccessfulCreate replicaset/nginx-5fc8f974d9 Created pod: nginx-5fc8f974d9-9gn8z
    3. Normal Scheduled pod/nginx-5fc8f974d9-9gn8z Successfully assigned default/nginx-5fc8f974d9-9gn8z to node4
    4. Normal Pulled pod/nginx-5fc8f974d9-9gn8z Container image "nginx:1.18" already present on machine
    5. Normal Created pod/nginx-5fc8f974d9-9gn8z Created container nginx
    6. Normal Started pod/nginx-5fc8f974d9-9gn8z Started container nginx
    7. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled down replica set nginx-bf95bf86b to 2
    8. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled up replica set nginx-5fc8f974d9 to 2
    9. Normal SuccessfulDelete replicaset/nginx-bf95bf86b Deleted pod: nginx-bf95bf86b-jsssl
    10. Normal Killing pod/nginx-bf95bf86b-jsssl Stopping container nginx
    11. Normal SuccessfulCreate replicaset/nginx-5fc8f974d9 Created pod: nginx-5fc8f974d9-nkcbd
    12. Normal Scheduled pod/nginx-5fc8f974d9-nkcbd Successfully assigned default/nginx-5fc8f974d9-nkcbd to node4
    13. Normal Pulled pod/nginx-5fc8f974d9-nkcbd Container image "nginx:1.18" already present on machine
    14. Normal Created pod/nginx-5fc8f974d9-nkcbd Created container nginx
    15. Normal Started pod/nginx-5fc8f974d9-nkcbd Started container nginx
    16. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled down replica set nginx-bf95bf86b to 1
    17. Normal SuccessfulDelete replicaset/nginx-bf95bf86b Deleted pod: nginx-bf95bf86b-98lpj
    18. Normal Killing pod/nginx-bf95bf86b-98lpj Stopping container nginx
    19. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled up replica set nginx-5fc8f974d9 to 3
    20. Normal SuccessfulCreate replicaset/nginx-5fc8f974d9 Created pod: nginx-5fc8f974d9-xw64m
    21. Normal Scheduled pod/nginx-5fc8f974d9-xw64m Successfully assigned default/nginx-5fc8f974d9-xw64m to node4
    22. Normal Pulled pod/nginx-5fc8f974d9-xw64m Container image "nginx:1.18" already present on machine
    23. Normal Created pod/nginx-5fc8f974d9-xw64m Created container nginx
    24. Normal Started pod/nginx-5fc8f974d9-xw64m Started container nginx
    25. Normal ScalingReplicaSet deployment/nginx (combined from similar events): Scaled down replica set nginx-bf95bf86b to 0
    26. Normal SuccessfulDelete replicaset/nginx-bf95bf86b Deleted pod: nginx-bf95bf86b-nfh5r

    更新完毕后,pod的状态:

    1. [root@node1 ~]# kubectl get po
    2. NAME READY STATUS RESTARTS AGE
    3. nginx-bf95bf86b-98lpj 1/1 Running 0
    4. nginx-bf95bf86b-jsssl 1/1 Running 0
    5. nginx-bf95bf86b-nfh5r 1/1 Running 0
    6. [root@node1 ~]# kubectl get replicasets.apps
    7. NAME DESIRED CURRENT READY AGE
    8. nginx-5fc8f974d9 3 3 3
    9. nginx-648458674d 0 0 0
    10. nginx-6888c79454 0 0 0
    11. nginx-bf95bf86b 0 0 0

    此时看rollout的历史,应该是4个,输出可以看到和上面的rc是一一对应的关系:

    1. [root@node1 ~]# kubectl rollout history deployment
    2. deployment.apps/nginx
    3. REVISION CHANGE-CAUSE
    4. 1
    5. 2
    6. 3
    7. 4

    查看deployment的历史详情:

    1. kubectl rollout history deployment nginx --revision=1
    2. deployment.apps/nginx with revision #1
    3. Pod Template:
    4. Labels: app=nginx
    5. pod-template-hash=6888c79454
    6. Containers:
    7. nginx:
    8. Image: nginx:1.18
    9. Port:
    10. Host Port:
    11. Environment:
    12. Mounts:
    13. Volumes:
    14. [root@node1 ~]# kubectl rollout history deployment nginx --revision=2
    15. deployment.apps/nginx with revision #2
    16. Pod Template:
    17. Labels: app=nginx
    18. pod-template-hash=6888c79454
    19. Containers:
    20. nginx:
    21. Image: nginx:1.18
    22. Port:
    23. Host Port:
    24. Environment:
    25. Mounts:
    26. Volumes:
    27. root@node1 ~]# kubectl rollout history deployment nginx --revision=3
    28. deployment.apps/nginx with revision #3
    29. Pod Template:
    30. Labels: app=nginx
    31. pod-template-hash=bf95bf86b
    32. Annotations: kubectl.kubernetes.io/restartedAt: 2023-11-18T17:06:24+08:00
    33. Containers:
    34. nginx:
    35. Image: nginx:1.18
    36. Port:
    37. Host Port:
    38. Environment:
    39. Mounts:
    40. Volumes:
    41. [root@node1 ~]# kubectl rollout history deployment nginx --revision=4
    42. deployment.apps/nginx with revision #4
    43. Pod Template:
    44. Labels: app=nginx
    45. pod-template-hash=5fc8f974d9
    46. Annotations: kubectl.kubernetes.io/restartedAt: 2023-11-18T17:10:02+08:00
    47. Containers:
    48. nginx:
    49. Image: nginx:1.18
    50. Port:
    51. Host Port:
    52. Environment:
    53. Mounts:
    54. Volumes:

    升级镜像到1.20.1 ,生成新历史版本5:

    1. [root@node1 ~]# kubectl apply -f nginx.yaml
    2. deployment.apps/nginx configured
    3. [root@node1 ~]# kubectl rollout history deployment nginx --revision=5
    4. deployment.apps/nginx with revision #5
    5. Pod Template:
    6. Labels: app=nginx
    7. pod-template-hash=6469d4d479
    8. Annotations: kubectl.kubernetes.io/restartedAt: 2023-11-18T17:10:02+08:00
    9. Containers:
    10. nginx:
    11. Image: nginx:1.20.1
    12. Port:
    13. Host Port:
    14. Environment:
    15. Mounts:
    16. Volumes:

    回滚版本到2:

    先查询历史版本

    1. [root@node1 ~]# kubectl get rs -o wide
    2. NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
    3. nginx-5fc8f974d9 0 0 0 nginx nginx:1.18 app=nginx,pod-template-hash=5fc8f974d9
    4. nginx-6469d4d479 3 3 3 nginx nginx:1.20.1 app=nginx,pod-template-hash=6469d4d479
    5. nginx-648458674d 0 0 0 nginx nginx:1.20.1 app=nginx,pod-template-hash=648458674d
    6. nginx-6888c79454 0 0 0 nginx nginx:1.18 app=nginx,pod-template-hash=6888c79454
    7. nginx-bf95bf86b 0 0 0 nginx nginx:1.18 app=nginx,pod-template-hash=bf95bf86b
    8. [root@node1 ~]# kubectl rollout history deployment nginx
    9. deployment.apps/nginx
    10. REVISION CHANGE-CAUSE
    11. 1
    12. 2
    13. 3
    14. 4
    15. 5
    16. [root@node1 ~]# kubectl rollout history deployment nginx --revision=2
    17. deployment.apps/nginx with revision #2
    18. Pod Template:
    19. Labels: app=nginx
    20. pod-template-hash=6888c79454
    21. Containers:
    22. nginx:
    23. Image: nginx:1.18
    24. Port:
    25. Host Port:
    26. Environment:
    27. Mounts:
    28. Volumes:

    回滚:

    1. [root@node1 ~]# kubectl rollout undo deployment nginx --to-revision=2
    2. deployment.apps/nginx rolled back

    查看是否正确回滚:

    1. [root@node1 ~]# kubectl get deployments.apps -owide
    2. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
    3. nginx 3/3 3 3 177m nginx nginx:1.18 app=nginx

    那么,重启pod一般常见的就是删除pod后在重新创建,但,对于多副本的pod来说,会有服务中断的风险,更新一般也是暴力方式删除pod后,修改后在重新启动了,或者副本数先设置为0后,在恢复到原先的设置。

    而如果想要服务不中断的,优雅的更新或者重启,首选还得是kubectl rollout 命令啦。

  • 相关阅读:
    java实现多文件压缩
    GPT-SoVITS教程,接入酒馆AI,SillyTavern-1.11.5,让AI女友声若幽兰
    花生壳微信公众号开发配置
    Go 微服务开发框架 DMicro 的设计思路
    Goroutine和协程的区别
    Nginx 报404问题,如何解决
    打造自己的前端组件库(奶妈版,超详细)
    NVIDIA的jetson平台进行can通讯学习,最终实现can收发详细教程(精五)
    RabbitMQ 高级特性
    数字化技术推动产业升级,B2B电商交易管理系统助力传统企业重塑竞争力
  • 原文地址:https://blog.csdn.net/alwaysbefine/article/details/134383027