
省略前面的过程,先到kubeadm init初始化master节点的命令
kubeadm init \
--apiserver-advertise-address=10.0.16.15 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version=v1.24.3 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--ignore-preflight-errors=all \
--cri-socket unix:///var/run/cri-dockerd.sock
报错信息截取
[init] Using Kubernetes version: v1.24.3
[preflight] Running pre-flight checks
[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[WARNING CRI]: container runtime is not running: output: time="2022-10-21T17:23:28+08:00" level=fatal msg="unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[WARNING Hostname]: hostname "k8s-master" could not be reached
[WARNING Hostname]: hostname "k8s-master": lookup k8s-master on 183.60.83.19:53: no such host
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.3: output: time="2022-10-21T17:23:28+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/kube-controller-manager:v1.24.3: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/kube-scheduler:v1.24.3: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/kube-proxy:v1.24.3: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/pause:3.7: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/etcd:3.5.3-0: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
[WARNING ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.6: output: time="2022-10-21T17:23:29+08:00" level=fatal msg="unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory\""
, error: exit status 1
Kubernetes在安装部署过程中,需要从k8s.grc.io仓库中拉取所需镜像文件,但由于国内网络防火墙问题导致无法正常拉取。上面使用阿里云镜像似乎也不行(参考了一些网上的命令),可能是没有匹配的版本。
这里打算在dockerHub上找其它版本的代替,再使用docker tag来改写镜像信息。
先通过下方的命令,查看需要下载的镜像列表
kubeadm config images list

以第一个k8s.gcr.io/kube-apiserver:v1.24.7为例,k8s.gcr.io是仓库账户信息,也是镜像名的一部分,kubeadm需要一摸一样的镜像。
使用docker search查看一下kube-apiserver的相关镜像

有很多有名的镜像应该都可以用,这里以kubesphere的为例。
到dockerHub查看该镜像的所有tag信息,最新的为1.24.3,与我们需要的镜像版本是对应的。

通过这种方式,找到所有镜像合适的版本,先pull再tag

这里需要稍微注意一下有的地方有v,有的没有。比如kubesphere的etcd的tag为v3.4.13,我们需要的为3.5.3-0。如果写错,可能会报错manifest unknown。镜像处理好后,再次执行kubeadm init命令,我这里有报一下错误。

处理方法为直接删除掉存在的文件,应该是之前安装遗留下的。
再次执行kubeadm init,依然报错

执行以下命令后再次kubeadm init
cat > /etc/containerd/config.toml <<EOF
[plugins."io.containerd.grpc.v1.cri"]
systemd_cgroup = true
EOF
systemctl restart containerd
接着报错拉取镜像失败

这里测试了以下,本地已经有需要的依赖了,但是仍然回去拉取镜像。而如果把image-repository参数加上则可以跳过,直接使用已存在的镜像。
此时我的kubeadm命令为
kubeadm init \
--apiserver-advertise-address=10.0.16.15 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
--control-plane-endpoint=k8s-master \
--kubernetes-version v1.24.3 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16
各个参数的含义,可以通过kubeadm init --help查看。
这里还有可能出现下面这个报错
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
执行以下kubeadm reset后再次执行kubeadm init上面那一串命令。
kubelet一直有点问题,get node失败。可以通过journalctl -xeu kubelet查看报错信息

调整了一下/etc/hosts配置(我当前机器内网ip设置了多个,注释掉多余的),6443接口连接被拒绝

到了这里,没有再去处理这个connection refused了,需要重启机器,执行reboot
再执行journalctl -xeu kubelet发现没有报错了

但
ERROR getting node的问题还在,kubelet还是没有成功运行。
相关的解决方案网上看了很多,试过都没有解决,直到看到这篇kubeadm部署kubernetes 1.24版本集群,1.24版本移除对Dockershim的支持,需要引入第三方cri-dockerd项目进行支持。安装cri-dockerd需要再去安装git,我这里就选择退而求其次选择安装1.23版的试试。
哎,又来!
首先卸载掉已安装的k8s相关组件
sudo kubeadm reset -f
sudo rm -rvf $HOME/.kube
sudo rm -rvf ~/.kube/
sudo rm -rvf /etc/kubernetes/
sudo rm -rvf /etc/systemd/system/kubelet.service.d
sudo rm -rvf /etc/systemd/system/kubelet.service
sudo rm -rvf /usr/bin/kube*
sudo rm -rvf /etc/cni
sudo rm -rvf /opt/cni
sudo rm -rvf /var/lib/etcd
sudo rm -rvf /var/etcd
docker kill $(docker ps -a -q)
docker rm $(docker ps -a -q)
通过yum list installed|grep kube发现1.24.3版的几个应用还在,通过yum remove批量卸载掉。
yum remove cri-tools.x86_64 kubeadm.x86_64 kubectl.x86_64 kubelet.x86_64 kubernetes-cni.x86_64

然后重新安装1.23.4版
sudo yum install -y kubelet-1.23.4 kubeadm-1.23.4 kubectl-1.23.4 --disableexcludes=kubernetes

查看对应需要的版本
kubeadm config images list
然后直接将上面tag的版本降版本,应该是向下兼容的
docker tag k8s.gcr.io/kube-apiserver:v1.24.7 k8s.gcr.io/kube-apiserver:v1.23.13
docker tag k8s.gcr.io/kube-controller-manager:v1.24.7 k8s.gcr.io/kube-controller-manager:v1.23.13
docker tag k8s.gcr.io/kube-scheduler:v1.24.7 k8s.gcr.io/kube-scheduler:v1.23.13
docker tag k8s.gcr.io/kube-proxy:v1.24.7 k8s.gcr.io/kube-proxy:v1.23.13
docker tag k8s.gcr.io/pause:3.7 k8s.gcr.io/pause:3.6
docker tag k8s.gcr.io/etcd:3.5.3-0 k8s.gcr.io/etcd:3.5.1-0

启动kubelet
sudo systemctl enable --now kubelet
执行kubeadm
kubeadm init \
--apiserver-advertise-address=10.0.16.15 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--control-plane-endpoint=k8s-master \
--kubernetes-version v1.23.4 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16


终于…将下面这段保存下来。
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join k8s-master:6443 --token mf8mlj.78j75mdt9z5yi8zk \
--discovery-token-ca-cert-hash sha256:8765989b39cfcb83404d2b2cb1dc6b0517eff3f4a1990ec1c4016ededa1deb93 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join k8s-master:6443 --token mf8mlj.78j75mdt9z5yi8zk \
--discovery-token-ca-cert-hash sha256:8765989b39cfcb83404d2b2cb1dc6b0517eff3f4a1990ec1c4016ededa1deb93