• K8S 网络问题导致 ns 相关的服务不能互相访问


    背景

    近期重新部署了一套K8S环境,是基于本机虚拟机,采用 Kuboard-Spray 方式

    使用 KuboardSpray 安装kubernetes_v1.23.1 | Kuboard

    安装成功后,并无感觉不妥,看到 pod 状态都是 running,以为大功告成,便开始部署应用。

    发现问题

    第一次发现问题时,当时是部署了一套若依系统,后端服务都是running,但是前端服务running20s左右状态变成了error.

     通过 查看log发现了端倪, nginx无法找到 upstream。而实际上这个应该是一个 host,却被当成了 upstream。

     nginx 部分配置如下

    1. location ^~ /prod-api/{
    2. proxy_set_header Host $http_host;
    3. proxy_set_header X-Real-IP $remote_addr;
    4. proxy_set_header REMOTE-HOST $remote_addr;
    5. proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    6. proxy_pass http://ruoyi-gateway.ruoyi-k8s:8080/;
    7. }

    通过ping ruoyi-gateway.ruoyi-k8s 也不通。

    当时就怀疑集群安装有问题,这整套服务之前在华为云服务器用k8s部署是ok的,本机部署就存在这种问题,配置调了几轮,无果。

    后面又用 system服务访问gateway服务,通过ping的方式仍然不通。

    当时就下结论,是集群网络问题,但具体是啥问题,还是一头雾水。

    准备再次重装集群的时候,问了以下同事——杰哥。 

    按照杰哥的思路,查了下网络插件,用的是 calico,都是running,以为正常,杰哥给我画了圈圈,一下子就明白了,原来虽然是running,但不代表服务是正常的,因为Ready数是0。

     沟通之后犹如醍醐灌顶,顺藤摸瓜,终于发现了插件网络不通。

    kubectl describe pods calico-node-f5qzf   -n kube-system

    1. ...
    2. Events:
    3. Type Reason Age From Message
    4. ---- ------ ---- ---- -------
    5. Normal Scheduled 88s default-scheduler Successfully assigned kube-system/calico-node-f5qzf to node1
    6. Normal Pulled 89s kubelet Container image "quay.io/calico/cni:v3.21.5" already present on machine
    7. Normal Created 89s kubelet Created container upgrade-ipam
    8. Normal Started 88s kubelet Started container upgrade-ipam
    9. Normal Pulled 88s kubelet Container image "quay.io/calico/cni:v3.21.5" already present on machine
    10. Normal Created 88s kubelet Created container install-cni
    11. Normal Started 87s kubelet Started container install-cni
    12. Normal Pulled 86s kubelet Container image "quay.io/calico/pod2daemon-flexvol:v3.21.5" already present on machine
    13. Normal Created 86s kubelet Created container flexvol-driver
    14. Normal Started 85s kubelet Started container flexvol-driver
    15. Normal Pulled 85s kubelet Container image "quay.io/calico/node:v3.21.5" already present on machine
    16. Normal Created 85s kubelet Created container calico-node
    17. Normal Started 84s kubelet Started container calico-node
    18. Warning Unhealthy 78s (x4 over 83s) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
    19. Warning Unhealthy 69s kubelet Readiness probe failed: 2022-07-01 03:42:37.964 [INFO][220] confd/health.go 180: Number of node(s) with BGP peering established = 0
    20. calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
    21. Warning Unhealthy 59s kubelet Readiness probe failed: 2022-07-01 03:42:47.960 [INFO][255] confd/health.go 180: Number of node(s) with BGP peering established = 0
    22. calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
    23. Warning Unhealthy 49s kubelet Readiness probe failed: 2022-07-01 03:42:57.893 [INFO][282] confd/health.go 180: Number of node(s) with BGP peering established = 0
    24. calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
    25. Warning Unhealthy 39s kubelet Readiness probe failed: 2022-07-01 03:43:07.909 [INFO][311] confd/health.go 180: Number of node(s) with BGP peering established = 0
    26. calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

    网络异常解决

    calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

    有异常就好办了,网上慰问了一番度娘,终于找出了问题。

    原因是通过 Kuboard-Spray 方式安装 K8s 集群,calico 网络默认读取的是 eth0 网口,但是,如果是通过 VM 虚拟机安装系统,网口一般为ens33,也就是网口配置不对。

    解决

    需要打开配置calico的YAML文件

    原始部分内容如下:

    1. - name: CALICO_NETWORKING_BACKEND
    2. valueFrom:
    3. configMapKeyRef:
    4. key: calico_backend
    5. name: calico-config
    6. - name: IP_AUTODETECTION_METHOD
    7. value: skip-interface=eth0

    将IP_AUTODETECTION_METHOD 的值改成  interface=ens33 即可,注意,yaml有几处都需要修改(大概是3处)。

    修改后自动重启服务

    再来看看 calico 

     READY 变成 了1/1,running状态。

    再次重启若依 web服务,也变成了running了,查看日志并无报错,

     再来检验system服务与gateway服务的网络

    kubectl exec -it ruoyi-system-7b6488bdd4-4kz5m -n ruoyi-k8s /bin/bash

    前方道路畅通 (* ̄︶ ̄)

    参考网址

    https://www.codenong.com/cs109711759/

  • 相关阅读:
    Linux高性能服务器编程——ch7笔记
    Android 9.0 SQLiteCantOpenDatabaseException SQLITE_CANTOPEN(不支持WAL模式)源码分析定位
    第八章:关系数据库设计
    新人必看!手把手教你如何使用浏览器表格插件(上)
    软件测试(二)用例
    Vue学习
    实际项目中如何进行问题排查
    Unity SKFramework框架(二十四)、Avatar Controller 第三人称控制
    Python+大数据-Spark技术栈(一) SparkBase环境基础
    云计算 2月26号 (进程管理和常用命令)
  • 原文地址:https://blog.csdn.net/liurui_wuhan/article/details/125556258