• Kuboard突然无法访问提示:Failed to connect to the database


    一、背景

    没有做任何特殊操作,突然kuboard访问时,提示如下信息:

    {
      "message": "Failed to connect to the database.",
      "type": "Internal Server Error"
    }
    
    • 1
    • 2
    • 3
    • 4

    二、排查过程

    此处kuboard为docker部署的,查看kuboard的运行情况,提示Up 6 months 正在运行

    docker ps | grep kuboard
    
    • 1

    查看kuboard容器的日志:

    docker logs -f  --tail=10  容器ID
    
    • 1
    [root@nb003 ~]# docker logs -f  --tail=10  a2caf8010e75
    time="2023-09-23T05:15:08Z" level=error msg="failed to rotate keys: etcdserver: mvcc: database space exceeded"
    {"level":"warn","ts":"2023-09-23T13:15:12.504+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-8f19b170-257f-4a30-942f-1be1122e3be0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
    time="2023-09-23T05:15:12Z" level=error msg="Storage health check failed: create auth request: etcdserver: mvcc: database space exceeded"
    
    
    • 1
    • 2
    • 3
    • 4
    • 5

    日志如上,发现提示ResourceExhausted desc = etcdserver: mvcc: database space exceeded,这表示etcd服务磁盘空间不足了,默认的空间配额限制为2G,超出空间配额限制就会影响服务,所以需要定期清理。
    故查看数据映射的空间大小,找到自己的kuboard-data,查看
    etcd db占用空间大小,发现从9月23日11点57的时候就是2GB了。已经达到默认的空间配额限制为2G的最大值。

    [root@nb003 snap]# cd /data/kuboard-data/etcd-data/member/snap
    [root@nb003 snap]# pwd
    /data/kuboard-data/etcd-data/member/snap
    [root@nb003 snap]# ls -lrth
    总用量 2.1G
    -rw-r--r-- 1 root root 8.0K 919 11:49 0000000000000005-00000000005c4542.snap
    -rw-r--r-- 1 root root 8.0K 920 08:41 0000000000000005-00000000005c6c53.snap
    -rw-r--r-- 1 root root 8.0K 921 05:33 0000000000000005-00000000005c9364.snap
    -rw-r--r-- 1 root root 8.0K 922 02:26 0000000000000005-00000000005cba75.snap
    -rw-r--r-- 1 root root 8.0K 923 07:13 0000000000000005-00000000005ce186.snap
    -rw------- 1 root root 2.0G 923 11:57 db
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    进入kuboard容器内部,查看etcd的情况:可以看到在ERRORS列里同样也提示了一个警告alarm:NOSPACE空间不足

    [root@nb003 snap]# docker exec -it a2caf8010e75 bash
    root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
    |       ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
    | http://127.0.0.1:2379 | 59a9c584ea2c3f35 |  3.4.14 |  2.1 GB |      true |      false |         6 |    6089300 |            6089300 |   memberID:6460912315094810421 |
    |                       |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
    root@a2caf8010e75:/# 
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    三、解决办法

    在kuboard容器中依次做如下操作:

    # 备份db
    etcdctl snapshot save backup.db
    # 查看当前版本
    rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
    # 压缩旧版本
    ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
    # 整理多余的空间
    ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 defrag
    # 取消告警信息(之前有nospace的告警)
    ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 alarm disarm
    # 再次查看etcd的状态(发现ERROR字段已为空)
    ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    详细过程及其输出如下:

    root@a2caf8010e75:/# etcdctl snapshot save backup.db
    {"level":"info","ts":1695447648.315712,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"backup.db.part"}
    {"level":"info","ts":"2023-09-23T13:40:48.317+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
    {"level":"info","ts":1695447648.3172774,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
    {"level":"info","ts":"2023-09-23T13:41:03.646+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
    {"level":"info","ts":1695447663.8131642,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"2.1 GB","took":15.497392681}
    {"level":"info","ts":1695447663.8132935,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"backup.db"}
    Snapshot saved at backup.db
    root@a2caf8010e75:/# rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
    root@a2caf8010e75:/#  ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
    compacted revision 6077603
    root@a2caf8010e75:/#  ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 defrag
    Finished defragmenting etcd member[http://127.0.0.1:2379]
    root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 alarm disarm
    memberID:6460912315094810421 alarm:NOSPACE 
    root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    |       ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | http://127.0.0.1:2379 | 59a9c584ea2c3f35 |  3.4.14 |  127 kB |      true |      false |         6 |    6089454 |            6089454 |        |
    +-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    root@a2caf8010e75:/# 
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    四、访问验证结果

    浏览器访问kuboard(ip:30080)验证,访问正常
    在这里插入图片描述
    查看etcd db的占用情况:发现大小变为156K
    在这里插入图片描述

    END

  • 相关阅读:
    前端培训丁鹿学堂:vue3之watchEffect和生命周期函数总结
    UE4/5:通过Blender制作BlendShape导入【UE4/5曲线、变形目标,blender形态键】
    minio单点及分布式部署
    股票程序化交易系统是如何计算胜率的?
    spring注解驱动系列--声明式事务
    论文阅读笔记:DepGraph: Towards Any Structural Pruning
    【字节跳动技术团队】2020年-2022年精选文章后端篇
    0825学习笔记(文件)
    上周热点回顾(4.1-4.7)
    ClickHouse数据类型完整使用 第三章
  • 原文地址:https://blog.csdn.net/wdy_2099/article/details/133203698