• 修改svc的LoadBalancer的IP引发的惨案


    背景

    k8s集群没有接外部负载均衡,部署istio的时候ingressgateway一直pending。
    于是手动修改了这个lb svc的externalIP,于是k8s就崩了,如何崩的,且听我还道来。

    修改externalIPs的操作

    在这里插入图片描述
    修改了svc的这个位置,于是api-server就崩了。

    [root@k8s-worker-node1 cloud-native-istio-archive]# k -n istio-system get svc
    NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                      AGE
    istio-egressgateway    ClusterIP      10.68.66.210   <none>        80/TCP,443/TCP                                                               8d
    istio-ingressgateway   LoadBalancer   10.68.215.92   <pending>     15021:30422/TCP,80:32418/TCP,443:31569/TCP,31400:32664/TCP,15443:31617/TCP   8d
    istiod                 ClusterIP      10.68.49.71    <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP                                        8d
    [root@k8s-worker-node1 cloud-native-istio-archive]# k -n istio-system edit svc istio-ingressgateway
    service/istio-ingressgateway edited
    [root@k8s-worker-node1 cloud-native-istio-archive]#
    [root@k8s-worker-node1 cloud-native-istio-archive]#
    [root@k8s-worker-node1 cloud-native-istio-archive]# k -n istio-system get svc
    The connection to the server 10.50.10.10:6443 was refused - did you specify the right host or port?
    [root@k8s-worker-node1 cloud-native-istio-archive]#
    [root@k8s-worker-node1 cloud-native-istio-archive]#
    [root@k8s-worker-node1 cloud-native-istio-archive]#
    [root@k8s-worker-node1 cloud-native-istio-archive]# k -n istio-system get svc
    The connection to the server 10.50.10.10:6443 was refused - did you specify the right host or port?
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    如果 EXTERNAL-IP 有值(IP 地址或主机名),则说明您的环境具有可用于 Ingress 网关的外部负载均衡器。如果 EXTERNAL-IP 值是 (或一直是 ),则说明可能您的环境并没有为 Ingress 网关提供外部负载均衡器的功能。

    api-server报错日志

    [root@k8s-worker-node1 cloud-native-istio-archive]# systemctl status kube-apiserver -l
    ● kube-apiserver.service - Kubernetes API Server
       Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
       Active: active (running) since Thu 2023-10-19 17:19:09 CST; 1 weeks 1 days ago
         Docs: https://github.com/GoogleCloudPlatform/kubernetes
     Main PID: 45101 (kube-apiserver)
        Tasks: 10
       Memory: 470.1M
       CGroup: /system.slice/kube-apiserver.service
               └─45101 /opt/kube/bin/kube-apiserver --allow-privileged=true --anonymous-auth=false --api-audiences=api,istio-ca --authorization-mode=Node,RBAC --bind-address=10.50.10.10 --client-ca-file=/etc/kubernetes/ssl/ca.pem --endpoint-reconciler-type=lease --etcd-cafile=/etc/kubernetes/ssl/ca.pem --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem --etcd-servers=https://10.50.10.10:2379 --kubelet-certificate-authority=/etc/kubernetes/ssl/ca.pem --kubelet-client-certificate=/etc/kubernetes/ssl/kubernetes.pem --kubelet-client-key=/etc/kubernetes/ssl/kubernetes-key.pem --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc --service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem --service-account-key-file=/etc/kubernetes/ssl/ca.pem --service-cluster-ip-range=10.68.0.0/16 --service-node-port-range=30000-32767 --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem --requestheader-client-ca-file=/etc/kubernetes/ssl/ca.pem --requestheader-allowed-names= --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --proxy-client-cert-file=/etc/kubernetes/ssl/aggregator-proxy.pem --proxy-client-key-file=/etc/kubernetes/ssl/aggregator-proxy-key.pem --enable-aggregator-routing=true --v=2
    
    Oct 27 23:41:20 k8s-worker-node1 kube-apiserver[45101]: "Metadata": null
    Oct 27 23:41:20 k8s-worker-node1 kube-apiserver[45101]: }. Err: connection error: desc = "transport: Error while dialing dial tcp 10.50.10.10:2379: connect: connection refused"
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: W1027 23:41:25.168319   45101 logging.go:59] [core] [Channel #57333 SubChannel #57334] grpc: addrConn.createTransport failed to connect to {
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "Addr": "10.50.10.10:2379",
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "ServerName": "10.50.10.10",
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "Attributes": null,
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "BalancerAttributes": null,
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "Type": 0,
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: "Metadata": null
    Oct 27 23:41:25 k8s-worker-node1 kube-apiserver[45101]: }. Err: connection error: desc = "transport: Error while dialing dial tcp 10.50.10.10:2379: connect: connection refused"
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    挽救

    重启api-server,起不来,etcd决绝连接。
    无法救回,连GPT4也不行
    在这里插入图片描述
    番外: 纪念一下中堂大人。

    教训

    没事不要随便改LB svc的 externalIP ,是根据这个博主的文章修改的https://www.cnblogs.com/boshen-hzb/p/10679863.html。 大家注意一下,不要把集群搞挂了。 任何时候对线上环境的更改应该小心,必须知道这么做的后果是什么?

  • 相关阅读:
    Solana流支付协议Zebec完成850万美元融资,CircleVentures等参投
    20232937文兆宇 2023-2024-2 《网络攻防实践》实践七报告
    训练准确率和测试准确率没下降,但是模型存在过拟合现象
    2022-itwangyang-前端数据埋点 SDK
    用原生JavaScript实现jQuery的$.getJSON
    opencv(1):创建和显示窗口, 读取保存图片
    7.1 yolov5优化模型时,自动标注xml数据
    三维重建系列 COLMAP: Structure-from-Motion
    【数据结构算法】动态规划之【单序列问题】
    【Lua 入门基础篇(十)】文件I/O
  • 原文地址:https://blog.csdn.net/MyySophia/article/details/134085838