/etc/haproxy/haproxy.cfg
中配置了绑定 vip ,启动 haproxy 服务时出现绑定失败的错误提示。/etc/sysctl.conf
,添加如下内容:net.ipv4.ip_nonlocal_bind=1
然后让变量生效
sysctl -p
接着再次启动haproxy
systemctl restart haproxy
ETCD_DATA_DIR
变量值即为数据存储目录systemctl restart etcd
cd /opt
kubectl get namespace 命名空间 -o json > 命名空间.json
修改 命名空间.json 中内容,删除 spec
和 status
对应的值后,然后执行下边命令:
kubectl proxy --port=9988 &
curl -k -H "Content-Type: application/json" -X PUT --data-binary @命名空间.json 127.0.0.1:9988/api/v1/namespaces/${命名空间}/finalize
istio-system jaeger-5994d55ffc-nmhq6 0/1 Terminating 0 13h
istio-system jaeger-5994d55ffc-pjj5m 0/1 Terminating 0 11h
istio-system kiali-64df7bf7cc-29kxl 0/1 Terminating 0 12h
istio-system kiali-64df7bf7cc-2bk77 0/1 Terminating 0 11h
istio-system kiali-64df7bf7cc-4wwhg 0/1 Terminating 0 14h
istio-system kiali-64df7bf7cc-8cfsh 0/1 Terminating 0 13h
istio-system kiali-64df7bf7cc-dks5w 0/1 Terminating 0 15h
istio-system kiali-64df7bf7cc-dkzgc 0/1 Terminating 0 15h
kubectl get pods -n 命名空间 | grep Terminating | awk '{print $1}' | xargs kubectl delete pod -n 命名空间 --force --grace-period=0
如果发现大量的Pod存在这种情况,可通过编写脚本定期执行。
kubectl logs -f PodName
查看日志时提示如下错误信息:Error from server (Forbidden): Forbidden (user=kubernetes, verb=get, resource=nodes, subresource=proxy)
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6":
Error response from daemon: Get https://k8s.gcr.io/v2/:
net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 registry.k8s.io/pause:3.6
问题描述
大量的 Pod 状态是 Evicted
解决办法
删除状态为 Evicted 的 Pod
kubectl get pods -A| grep Evicted | awk '{print $2}' | xargs kubectl delete pod -n
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready
journalctl -u kubelet --since today |less
0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }
k8s 节点不可调度,使用kubectl 工具查看节点状态
kubectl get nodes -o wide
显示结果如下所示:
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady,SchedulingDisabled 43h v1.24.2
k8s-master2 Ready 4d6h v1.24.2
k8s-node1 NotReady,SchedulingDisabled 44h v1.24.2
# 禁止调度
kubectl cordon 节点名称
# 解除禁用
kubectl uncordon 节点名称
cni-installer/ : Unable to create token for CNI kubeconfig error=Post
"https://10.255.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-node/token":
dial tcp 10.255.0.1:443: i/o timeout
--service-cluster-ip-range
与 --cluster-cidr
值是否出现了重叠,导致了集群变成了单机环境。calico-node启动失败,在事件信息中出现如下:
Back-off restarting failed container
invalid capacity 0 on image filesystem
Node k8s-node2 status is now: NodeHasNoDiskPressure
Updated Node Allocatable limit across pods
Node k8s-node2 status is now: NodeHasSufficientPID
#: kubectl logs -n kube-system calico-node-wzq2p -c install-cni
#: kubectl describe pod calico-node-wzq2p -n kube-system
#: journalctl -u kubelet -f
提示磁盘空间不足不一定真的是磁盘空间不足导致的calico-node无法启动,需要查看具体的日志信息。可以使用 kubectl logs -n kube-system calico-node-wzq2p -c install-cni
查看具体的错误提示,然后分析问题。作者曾经就一直以为是磁盘空间不足导致的 calico-node 无法启动,后来查看了详细日志才发现是 kube-proxy
的配置参数 --cluster-cidr
与 kube-controller-manager
、kube-apiserver
中设置的 --service-cluster-ip-range
不匹配。
clusterDNS
依赖 CoreDNS 域名服务 IP 地址,当 Kubelet 服务启动时设置的 clusterDNS
参数与 CoreDNS 部署设置的域名服务 IP 地址不一致时,将会导致服务访问超时。cd /opt
git clone https://github.com/coredns/deployment
cd /opt/deployment/kubernetes
./deploy.sh -r 10.255.0.0/16 -i 10.255.0.2 > coredns.yaml
kubectl apply -f coredns.yaml
上边配置中的 -i
既是设置 DNS 域名服务 IP 地址。
unable to proxy Istiod pods.
Make sure your Kubernetes API server has access to the Istio control plane through 8080 port
yum install socat -y
kubectl patch svc -n istio-ingress istio-ingress -p '{"spec": {"type": "NodePort"}}'
helm upgrade --set meshConfig.outboundTrafficPolicy.mode=REGISTRY_ONLY istiod istio/istiod -n istio-system