• Kubernetes三探(安装calico,join,以及遇到的问题)


    昨晚加班到10点····搞这个破玩意儿
    言归正传
    上一篇在master成功 kubeadm init了,但是安装network add-on时总是出错。今天来再试一试。
    首先我是按照这篇博文安装的,
    https://blog.csdn.net/weixin_43645454/article/details/124952184
    因为国内安装真的太多坑了。官网根本没法看

    首先我按照这篇博文没有成功安装calico,原因是:
    在kubeadm init 时 配置了serviceSubnet,很明显是service的子网的意思
    同时 在calico.yml 中配置的是CALICO_IPV4POOL_CIDR,意思是pod ip池
    博文中介绍要一样。
    但实际上。apiservice的网段跟 pod的网段 是不一样的。我目前成功启动的配置是:

    networking:
      dnsDomain: cluster.local
      serviceSubnet: 172.21.0.0/16
      podSubnet: 172.22.0.0/16
    
    • 1
    • 2
    • 3
    • 4

    然后calico.yml 中:

        - name: CALICO_IPV4POOL_CIDR
          value: "172.22.0.0/16"
    
    • 1
    • 2

    这样就成功启动了。

    下面说一下遇到的问题:
    错误1:
    kubectl apply -f calico.yaml 后,calico-node报错
    或者worker节点join后,calico-node 启动失败(例如:CrashLoopBackOff )

    929 16:12:48 master kubelet[12272]: E0929 16:12:48.116920   12272 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"71281bf7c6d991756cac784f7c9943e200a3e69fa49afe3299f98c6a5fd6b366\": plugin type=\"calico\" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"
    929 16:12:48 master kubelet[12272]: E0929 16:12:48.117002   12272 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"71281bf7c6d991756cac784f7c9943e200a3e69fa49afe3299f98c6a5fd6b366\": plugin type=\"calico\" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/" pod="kube-system/calico-kube-controllers-58dbc876ff-bc5dg"
    929 16:45:31 master kubelet[32709]: E0929 16:45:31.311990   32709 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-kube-controllers-58dbc876ff-7lxsj_kube-system(1eec9a3f-6310-492d-b2c5-c6278356c48e)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"calico-kube-controllers-58dbc876ff-7lxsj_kube-system(1eec9a3f-6310-492d-b2c5-c6278356c48e)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"84dfe491af29e30551e124ac6c73bfcd2ffd089ab900192d745441868083f6dd\\\": plugin type=\\\"calico\\\" failed (add): error adding host side routes for interface: cali11848191ccc, error: route (Ifindex: 9, Dst: 10.0.0.1/32, Scope: link) already exists for an interface other than 'cali11848191ccc': route (Ifindex: 5, Dst: 10.0.0.1/32, Scope: link, Iface: cali13a7d337791)\"" pod="kube-system/calico-kube-controllers-58dbc876ff-7lxsj" podUID=1eec9a3f-6310-492d-b2c5-c6278356c48e
    
    
    • 1
    • 2
    • 3
    • 4

    这个是因为calico安装 卸载了很多次,有时候 k delete -f calico.yaml 没有删除虚拟网卡或者路由(暂时这样称呼,暴露了我基础知识的薄弱,鸟哥的书买了好几年也没有看,要把这个提上日程了)。
    解决办法:

    // 每次kubeadm reset 后
    // 都要先删除网络的配置,其实reset的提示里有让删除这个
    rm -rf /etc/cni/net.d/*
    // 然后删除 遗留的路由或网卡
    // link/ipip 或 link/ether
    
    // 查询网卡/路由
    ip a / ip addr / ip link / ip route
    或者
    ifconfig
    // 如果能看到别的网卡 例如我的是这样
    [root@master ~]# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:2f:98:e0 brd ff:ff:ff:ff:ff:ff
        inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
           valid_lft 65590sec preferred_lft 65590sec
        inet6 fe80::be6e:ee2a:bcd9:e981/64 scope link noprefixroute
           valid_lft forever preferred_lft forever
    3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:bc:f7:2b brd ff:ff:ff:ff:ff:ff
        inet 192.168.56.106/24 brd 192.168.56.255 scope global noprefixroute dynamic enp0s8
           valid_lft 552sec preferred_lft 552sec
        inet6 fe80::5753:6a6a:3f3:6f5b/64 scope link noprefixroute
           valid_lft forever preferred_lft forever
    4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
        link/ether 02:42:cc:58:a5:6e brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
           valid_lft forever preferred_lft forever
    15: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
        link/ipip 0.0.0.0 brd 0.0.0.0
        inet 172.22.219.64/32 scope global tunl0
           valid_lft forever preferred_lft forever
    16: cali9035434f5df@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
        link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet6 fe80::ecee:eeff:feee:eeee/64 scope link
           valid_lft forever preferred_lft forever
    17: cali4af7a3781d7@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
        link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
        inet6 fe80::ecee:eeff:feee:eeee/64 scope link
           valid_lft forever preferred_lft forever
    18: calib71dfeb1411@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
        link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
        inet6 fe80::ecee:eeff:feee:eeee/64 scope link
           valid_lft forever preferred_lft forever
    
    // 后面三个 cali开头的还有tunl0都是要删除的
    modprobe -r ipip // 删除tunl0
    ip link delete cali23bcdbdbc8c // 删除cali开头的ip link
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56

    错误2:
    worker join 后 的calico-node 报的错

    929 16:55:44 master kubelet[9251]: E0929 16:55:44.151178    9251 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coredns-c676cc86f-dh5bn_kube-system(70d1a056-dd07-4162-b350-85d6be15276b)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-c676cc86f-dh5bn_kube-system(70d1a056-dd07-4162-b350-85d6be15276b)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"b1bced40c96601e0c114392e6388991a6609fcfd81ac2f1c2a359840f272e997\\\": plugin type=\\\"calico\\\" failed (add): error getting ClusterInformation: Get \\\"https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\\\": dial tcp 10.0.0.1:443: connect: connection refused\"" pod="kube-system/coredns-c676cc86f-dh5bn" podUID=70d1a056-dd07-4162-b350-85d6be15276b
    2022-09-29 13:22:02.617 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.244.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-node/token": dial tcp 10.244.0.1:443: i/o timeout
    
    
    • 1
    • 2
    • 3

    查看下url
    https://10.0.6.1:443/api/
    很明显,请求的ip是我当时设置的CALICO_IPV4POOL_CIDR,ip咋会是443? 我在init.yaml 里面是这样定义的

    apiVersion: kubeadm.k8s.io/v1beta3
    bootstrapTokens:
    - groups:
      - system:bootstrappers:kubeadm:default-node-token
      token: abcdef.0123456789abcdef
      ttl: 24h0m0s
      usages:
      - signing
      - authentication
    kind: InitConfiguration
    localAPIEndpoint:
      advertiseAddress: 192.168.56.106
      bindPort: 6443
    nodeRegistration:
      criSocket: unix:///var/run/containerd/containerd.sock
      imagePullPolicy: IfNotPresent
      name: master
      taints: null
    ---
    apiServer:
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta3
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controllerManager: {}
    dns: {}
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: registry.aliyuncs.com/google_containers
    kind: ClusterConfiguration
    kubernetesVersion: 1.25.0
    networking:
      dnsDomain: cluster.local
      serviceSubnet: 172.21.0.0/16
      podSubnet: 172.22.0.0/16
      
    scheduler: {}
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39

    绑定的端口是6443。所以,就像开头说的,把serviceSubnet和podSubnet分开,启动成功。还要注意网段,我的虚拟机的网段都是192的。
    看了百度上很多 解决办法

    //有说在calico.yaml中加入KUBERNETES_SERVICE_HOST和KUBERNETES_SERVICE_PORT的
    
    - name: KUBERNETES_SERVICE_HOST
      value: "kube-apiserver"  # master apiserver 地址
    - name: KUBERNETES_SERVICE_PORT
      value: "6443"
    - name: KUBERNETES_SERVICE_PORT_HTTPS
      value: "6443"
      
    //有说加IP_AUTODETECTION_METHOD的
                - name: IP_AUTODETECTION_METHOD
                  value: "interface=enp.*"
         
    //官网上说可以加一个ConfigMap来设置,也尝试了
    https://projectcalico.docs.tigera.io/maintenance/ebpf/enabling-ebpf#configure-calico-to-talk-directly-to-the-api-server
    
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: kubernetes-services-endpoint
      namespace: kube-system
    data:
      KUBERNETES_SERVICE_HOST: "192.168.56.106"
      KUBERNETES_SERVICE_PORT: "6443"
      KUBERNETES_SERVICE_PORT_HTTPS: "6443"
      
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    反正最后都没有成功,最后还是修改子网的配置成功了,因为网段压根不一样。
    最后worker节点成功加入集群

    [root@master ~]# kubectl get pods --all-namespaces -owide
    NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
    kube-system   calico-kube-controllers-58dbc876ff-pvpft   1/1     Running   0          4h15m   172.22.219.66    master   <none>           <none>
    kube-system   calico-node-bd4vg                          1/1     Running   0          4h15m   192.168.56.106   master   <none>           <none>
    kube-system   calico-node-p98gc                          1/1     Running   0          4h12m   192.168.56.107   node01   <none>           <none>
    kube-system   coredns-c676cc86f-lq4kx                    1/1     Running   0          4h15m   172.22.219.65    master   <none>           <none>
    kube-system   coredns-c676cc86f-rjkp8                    1/1     Running   0          4h15m   172.22.219.67    master   <none>           <none>
    kube-system   etcd-master                                1/1     Running   9          4h15m   192.168.56.106   master   <none>           <none>
    kube-system   kube-apiserver-master                      1/1     Running   0          4h15m   192.168.56.106   master   <none>           <none>
    kube-system   kube-controller-manager-master             1/1     Running   0          4h15m   192.168.56.106   master   <none>           <none>
    kube-system   kube-proxy-4k9rr                           1/1     Running   0          4h12m   192.168.56.107   node01   <none>           <none>
    kube-system   kube-proxy-mzp7q                           1/1     Running   0          4h15m   192.168.56.106   master   <none>           <none>
    kube-system   kube-scheduler-master                      1/1     Running   9          4h15m   192.168.56.106   master   <none>           <none>
    
    [root@master ~]# k get nodes
    NAME     STATUS   ROLES           AGE     VERSION
    master   Ready    control-plane   4h15m   v1.25.0
    node01   Ready    <none>          4h11m   v1.25.0
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    最后说一下,其实k8s 官方文档里面 troubleshooting kubeadm 页面的东西能解决你 99.99%的问题。剩下 0.01%是网络环境的问题。
    但是这0.01%的网络问题很难··因为不太了解linux网络的东西。
    比如安装ipset及ipvsadm,都是啥··
    还有 搜索问题应该是
    1、去官网找
    2、去github上找相关issue
    3、实在没办法,百度

    最后总结下查找错误日志的命令,这些也很重要,要不然你都无从下手。

    // 查看配置
    kubectl config view
    // kubernetes查看当前context
    kubectl config get-contexts
    // 切换名称空间
    kubectl config set-context --current --namespace=<namespace>
    // get 所有 pod
    kubectl get pods --all-namespaces
    // 更详细
    kubectl get pods --all-namespaces -owide
    // 删除 pod 或者 node
    kubectl delete pod -n kube-system coredns-6f4fd4bdf-8q7zp
    kubectl delete nodes node01
    
    // kubelet 的日志
    journalctl -xefu kubelet
    // 查询某个pod,仔细观察日志
    kubectl describe pod -n kube-system pod_name 
    // 查询某个pod的某个container的日志 
    kubectl logs -n kube-system calico-node-jx4k5 -c install-cni
    // watch 很有意思
    watch kubectl get pods --all-namespaces
    // 查询状态
    systemctl status kubelet
    // 给node设置标签
    kubectl label no node2 kubernetes.io/role=test-node
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    基本就这三板斧 describe logs journalctl

    好,下一篇开始部署点东西试试

    不太对啊,还是有问题

    Warning  Unhealthy  69s (x2 over 70s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
    
    • 1

    虽然是个warning,但不知道会不会有什么问题,反正目前都正常,都在running状态

  • 相关阅读:
    网络安全之网站常见的攻击方式
    Type List(C++ 模板元编程)
    [pytorch]手动构建一个神经网络并且训练
    查找文件夹下不同的图片名字
    protocol 协议语言介绍
    跳表的设计与应用场景
    【王道】计算机组成原理第四章指令系统(四)
    CMIP6数据处理
    C++与C语言中的字符串
    mysql八股
  • 原文地址:https://blog.csdn.net/NOOBBB/article/details/127120224