• 迁移kubelet、docker和containerd工作目录


    问题背景

    kubelet、docker和containerd 的工作目录默认都在 /var/lib 下。
    但是我们学校实验室租的线上机器挂载在 / 的磁盘空间很小,挂载在 /mnt/data_mnt/ 的数据盘空间大。
    应该是因为工作目录的原因,当 /占用超过 80% 时, kubelet 会认为磁盘空间不足,因为 DiskPressure 而进入 NotReady 状态。

    (以下是迁移后)

    root@iZhp3hqett0mw795req5b2Z:~# df -h | head
    Filesystem      Size  Used Avail Use% Mounted on
    udev             16G     0   16G   0% /dev
    tmpfs            16G   19M   16G   1% /run
    /dev/vda1        99G   48G   46G  51% /
    tmpfs            16G     0   16G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    tmpfs            16G     0   16G   0% /sys/fs/cgroup
    /dev/vdb1       493G  120G  348G  26% /mnt/data_mnt
    overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/54a47bbff1442f521326770cab94eb3221d82b0ff9e997c1b2efe6cad811b21b/merged
    overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/a74d553e701c85c5ad25fd14a8fd30383e0dc21f4b567bc81e6b7ac74bc73524/merged
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    迁移

    Docker

    停止 Docker 服务

    删除所有容器后。

    systemctl stop docker
    
    • 1

    修改配置

    Docker配置文件在 /etc/docker/daemon.json,增加字段设置数据目录。

    参考官网文档 https://docs.docker.com/config/daemon/#daemon-data-directory

    修改后示例:

    {
     "registry-mirrors": [
         "https://dockerhub.azk8s.cn",
         "https://hub-mirror.c.163.com",
         "https://reg-mirror.qiniu.com"
     ],
    
       "builder": {
           "gc": {
             "defaultKeepStorage": "20GB",
             "enabled": true
           }
       },
       "experimental": true,
       "features": {
         "buildkit": false
       },
       "dns": ["8.8.8.8", "8.8.4.4"],
       "data-root": "/mnt/data_mnt/var/lib/docker"
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    移动文件

    /var/lib/docker 复制到 /mnt/data_mnt/var/lib/docker

    重新启动 Docker 服务

    systemctl start docker
    
    # 跑一个 nginx 看看
    docker run -p 80:80 nginx
    
    # 查看服务状态
    systemctl status docker
    
    
    ● docker.service - Docker Application Container Engine
       Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
       Active: active (running) since Tue 2023-10-17 22:28:47 CST; 12h ago
         Docs: https://docs.docker.com
     Main PID: 3917580 (dockerd)
        Tasks: 25
       Memory: 1.0G
          CPU: 1min 16.247s
       CGroup: /system.slice/docker.service
               ├─ 370428 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 5050 -container-ip 172.17.0.2 -container-port 5000
               └─3917580 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
    
    Oct 18 10:11:41 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:11:41.715286425+08:00" level=error msg="Handler for POST /v1.41/containers/f66c7e907176ccd2abe010253448ab6dcab286c60f893b4cde72184215747d90/start returned error: driver 
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.451142888+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5000/v2/\": http: server gave HTTP response to HTTPS client
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455921606+08:00" level=error msg="Upload failed: no basic auth credentials"
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455953643+08:00" level=error msg="Upload failed: no basic auth credentials"
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455930600+08:00" level=error msg="Upload failed: no basic auth credentials"
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456010183+08:00" level=error msg="Upload failed: no basic auth credentials"
    Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456058582+08:00" level=info msg="Attempting next endpoint for push after error: no basic auth credentials"
    Oct 18 10:18:56 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:18:56.354196507+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
    Oct 18 10:19:02 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:02.439702060+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
    Oct 18 10:19:07 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:07.267669420+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32

    containerd

    停止服务

    systemctl stop containerd
    
    
    • 1
    • 2

    修改配置

    配置文件在 /etc/containerd/config.toml

    可以看到root = "/mnt/data_mnt/var/lib/containerd",可见工作目录默认在 /var/lib/containerd

    万一不小心改乱了,可以重新生成默认配置:

    containerd config default > /etc/containerd/config.toml
    
    • 1

    修改后例如:

    version = 2
    root = "/mnt/data_mnt/var/lib/containerd"
    state = "/run/containerd"
    oom_score = 0
    
    [grpc]
      address = "/run/containerd/containerd.sock"
      uid = 0
      gid = 0
      max_recv_message_size = 16777216
      max_send_message_size = 16777216
    
    [debug]
      address = "/run/containerd/containerd-debug.sock"
      uid = 0
      gid = 0
      level = "warn"
    
    [timeouts]
      "io.containerd.timeout.shim.cleanup" = "5s"
      "io.containerd.timeout.shim.load" = "5s"
      "io.containerd.timeout.shim.shutdown" = "3s"
      "io.containerd.timeout.task.state" = "2s"
    
    [plugins]
      [plugins."io.containerd.grpc.v1.cri"]
        sandbox_image = "sealos.hub:5000/pause:3.9"
        max_container_log_line_size = -1
        max_concurrent_downloads = 20
        disable_apparmor = false
        [plugins."io.containerd.grpc.v1.cri".containerd]
          snapshotter = "overlayfs"
          default_runtime_name = "runc"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
              runtime_type = "io.containerd.runc.v2"
              runtime_engine = ""
              runtime_root = ""
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                SystemdCgroup = true
        [plugins."io.containerd.grpc.v1.cri".registry]
          config_path = "/etc/containerd/certs.d"
          [plugins."io.containerd.grpc.v1.cri".registry.configs]
              [plugins."io.containerd.grpc.v1.cri".registry.configs."sealos.hub:5000".auth]
                username = "admin"
                password = "passw0rd"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46

    移动文件

    /mnt/data_mnt/var/lib/containerd 复制到 /var/lib/containerd

    重新启动服务

    systemctl start containerd
    
    systemctl status containerd
    
    
    • 1
    • 2
    • 3
    • 4

    kubelet(遇到问题待解决)

    停止服务

    systemctl stop kubelet
    
    
    • 1
    • 2

    修改配置

    kubelet 服务的配置,我的配置在 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

    注意同一目录可能还有个文件 /etc/systemd/system/kubelet.service.d/override.conf 实际运行中会用 override.conf 覆盖 10-kubeadm.conf 的内容。

    修改后内容示例:

    # Note: This dropin only works with kubeadm and kubelet v1.11+
    [Service]
    Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
    Environment="KUBELET_CONFIG_ARGS=--config=/mnt/data_mnt/var/lib/kubelet/config.yaml"
    # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
    EnvironmentFile=-/mnt/data_mnt/var/lib/kubelet/kubeadm-flags.env
    # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
    # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
    Environment="KUBELET_EXTRA_ARGS= \
                   \
                   \
                  --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --image-service-endpoint=unix:///var/run/image-cri-shim.sock"
    ExecStart=
    ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    另外还要修改 /etc/kubernetes/kubelet.conf 中配置的密钥地址,修改后示例(部分)

    # 以上省略
    users:
    - name: system:node:izhp3hqett0mw795req5b2z
      user:
        client-certificate: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
        client-key: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    另外还要建软连接,因为读取密钥时,是通过名为“当前”的软连接找实际特定版本的密钥,移动后就乱套了。

    ln -s  kubelet-client-2023-10-07-11-14-02.pem kubelet-client-current.pem 
    
    • 1

    移动文件(遇到问题待解决)

    有些文件删除不了……

    root@iZhp3hqett0mw795req5b2Z:~# rm -rf /var/lib/kubelet
    rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~projected/kube-api-access-6jt8n': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~empty-dir/tmp-volume': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/54e7cb22-fdab-4e33-afb3-c8ba88d153a2/volumes/kubernetes.io~projected/kube-api-access-j84xs': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/d1a3fba3-3ab8-4ef9-b61c-6479b26c79f7/volumes/kubernetes.io~projected/kube-api-access-lf5tx': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/5e38f3a0-7f59-4d2e-98f4-1ec915e6ba89/volumes/kubernetes.io~projected/kube-api-access-prz4v': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/0f02517c-01c3-4b58-9f85-be169a92a31d/volumes/kubernetes.io~projected/kube-api-access-r4kxp': Device or resource busy
    rm: cannot remove '/var/lib/kubelet/pods/7098d438-0a9d-40df-aee1-ec4884ba262f/volumes/kubernetes.io~projected/kube-api-access-rqtwq': Device or resource busy
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    重新启动服务

    systemctl start kubelet
    
    systemctl status kubelet
    
    
    • 1
    • 2
    • 3
    • 4

    使用的版本

    日期:2023年10月18日

    版本

    root@iZhp3hqett0mw795req5b2Z:~# kubectl version
    WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
    Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
    Kustomize Version: v5.0.1
    Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
    
    root@iZhp3hqett0mw795req5b2Z:~# docker version
    Client:
     Version:           20.10.21
     API version:       1.41
     Go version:        go1.18.1
     Git commit:        20.10.21-0ubuntu1~18.04.3
     Built:             Thu Apr 27 05:50:21 2023
     OS/Arch:           linux/amd64
     Context:           default
     Experimental:      true
    
    Server:
     Engine:
      Version:          20.10.21
      API version:      1.41 (minimum version 1.12)
      Go version:       go1.18.1
      Git commit:       20.10.21-0ubuntu1~18.04.3
      Built:            Thu Apr 27 05:36:22 2023
      OS/Arch:          linux/amd64
      Experimental:     true
     containerd:
      Version:          1.6.12-0ubuntu1~18.04.1
      GitCommit:        
     runc:
      Version:          1.1.4-0ubuntu1~18.04.2
      GitCommit:        
     docker-init:
      Version:          0.19.0
      GitCommit:        
    
    root@iZhp3hqett0mw795req5b2Z:~# containerd --version
    containerd github.com/containerd/containerd 1.6.12-0ubuntu1~18.04.1 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
  • 相关阅读:
    【ES6】学习笔记:字符串扩展
    Python潮流周刊#10:Twitter 的强敌 Threads 是用 Python 开发的!
    WPForms Pro插件下载:简化您的在线表单构建,提升用户互动
    Matlab绘制垂直的直线图
    Mybatis框架学习
    【git命令】
    免交互一键部署NFS
    树莓派基金会近日发布了新版基于 Debian 的树莓派操作系统
    代码随想录-day1
    【On Nacos】SpringBoot 方式使用 Nacos
  • 原文地址:https://blog.csdn.net/u010834463/article/details/133901254