• 在一台Ubuntu服务器中部署Ceph分布式存储


    环境

    OS:Linux 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
    ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

    准备

    #安装GPG证书 
    curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
    
    此处流程与docker官网不同,由于docker官网推荐镜像在境外,速度会比较慢,我们这里使用阿里云镜像
    
    来自 <https://blog.csdn.net/b1134977524/article/details/120442417> 
    
    
    sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    部署

    参考官网https://docs.ceph.com/en/latest/install/manual-deployment/

    apt -y install cephadm
    cephadm bootstrap --mon-ip 192.168.1.20 --single-host-defaults
    cephadm add-repo --release reef
    cephadm install ceph-common
    
    • 1
    • 2
    • 3
    • 4

    使用

    新增OSD

    参考https://docs.ceph.com/en/latest/cephadm/services/osd/#cephadm-deploy-osds
    Tell Ceph to consume any available and unused storage device:

    查看
    ceph orch device ls
    方法1
    ceph orch apply osd --all-available-devices
    Create an OSD from a specific device on a specific host:
    方法2
    ceph orch daemon add osd :
    For example:

    ceph orch daemon add osd host1:/dev/sdb

    删除OSD

    # ceph orch device ls
    # ceph orch osd rm 0
    Scheduled OSD(s) for removal.
    VG/LV for the OSDs won't be zapped (--zap wasn't passed).
    Run the `ceph-volume lvm zap` command with `--destroy` against the VG/LV if you want them to be destroyed.
    
    #上面这段话表明了以下几个信息:
    
    #1.OSD 设备已经被计划移除。
    #2.在移除 OSD 设备时,与这些设备相关的卷组(VG)和逻辑卷(LV)不会被删除。
    #3.如果需要删除与这些 OSD 设备相关的 VG/LV,需要运行 ceph-volume lvm zap 命令,并添加 --destroy 参数。
    #4.换句话说,这个命令告诉你 OSD 设备即将被删除,但是与这些设备相关的 VG/LV 不会被自动删除,如果需要彻底删除这些 VG/LV,需要运行ceph-volume lvm zap 命令,并在命令中添加 --destroy 参数
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    参考链接: https://www.xiaowangc.com/archives/40095af5.html

    # cephadm shell
    Inferring fsid def7078a-56ce-11ee-a479-1be9fab3deba
    Inferring config /var/lib/ceph/def7078a-56ce-11ee-a479-1be9fab3deba/mon.test/config
    Using ceph image with id '22cd8daf4d70' and tag 'v17' created on 2023-09-06 00:05:04 +0800 CST
    quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232
    # ceph-volume lvm zap --destroy /dev/nvme1n1
    --> Zapping: /dev/nvme1n1
    --> Zapping lvm member /dev/nvme1n1. lv_path is /dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e
    Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e bs=1M count=10 conv=fsync
     stderr: 10+0 records in
    10+0 records out
     stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0173921 s, 603 MB/s
    --> Only 1 LV left in VG, will proceed to destroy volume group ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
    Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
     stderr: Removing ceph--4229b334--54f8--4b21--80ed--3733cc2d4910-osd--block--866ec7e0--8b60--469e--8124--3d770608977e (253:3)
     stderr: Archiving volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" metadata (seqno 5).
     stderr: Releasing logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e"
     stderr: Creating volume group backup "/etc/lvm/backup/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" (seqno 6).
     stdout: Logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e" successfully removed
     stderr: Removing physical volume "/dev/nvme1n1" from volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910"
     stdout: Volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" successfully removed
    Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/nvme1n1
     stdout: Labels on physical volume "/dev/nvme1n1" successfully wiped.
    Running command: /usr/bin/dd if=/dev/zero of=/dev/nvme1n1 bs=1M count=10 conv=fsync
     stderr: 10+0 records in
    10+0 records out
     stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0312679 s, 335 MB/s
    --> Zapping successful for: <Raw Device: /dev/nvme1n1>
    
    
    # ceph osd tree
    ID  CLASS  WEIGHT  TYPE NAME      STATUS  REWEIGHT  PRI-AFF
    -1              0  root default
    -3              0      host test
    # ceph status
      cluster:
        id:     def7078a-56ce-11ee-a479-1be9fab3deba
        health: HEALTH_WARN
                mon test is low on available space
                OSD count 0 < osd_pool_default_size 2
    
      services:
        mon: 1 daemons, quorum test (age 24h)
        mgr: test.ksgjsf(active, since 23h), standbys: test.lizbxa
        osd: 0 osds: 0 up (since 5m), 0 in (since 3h)
    
      data:
        pools:   0 pools, 0 pgs
        objects: 0 objects, 0 B
        usage:   0 B used, 0 B / 0 B avail
        pgs:
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53

    CEPHADM 操作之清除集群

    参考 https://www.cnblogs.com/varden/p/15966516.html

    问题记录

    1、 ssh 端口不是22 导致安装ceph失败

    /usr/bin/ceph: stderr Error EINVAL: Traceback (most recent call last):
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command
    /usr/bin/ceph: stderr     return self.handle_command(inbuf, cmd)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
    /usr/bin/ceph: stderr     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    /usr/bin/ceph: stderr     return self.func(mgr, **kwargs)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
    /usr/bin/ceph: stderr     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
    /usr/bin/ceph: stderr     return func(*args, **kwargs)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 356, in _add_host
    /usr/bin/ceph: stderr     return self._apply_misc([s], False, Format.plain)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 1092, in _apply_misc
    /usr/bin/ceph: stderr     raise_if_exception(completion)
    /usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
    /usr/bin/ceph: stderr     e = pickle.loads(c.serialized_exception)
    /usr/bin/ceph: stderr TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
    /usr/bin/ceph: stderr
    ERROR: Failed to add host <test>: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=test -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/29dba80c-5146-11ee-a479-1be9fab3deba:/var/log/ceph:z -v /tmp/ceph-tmphvmle3fb:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmph9if039g:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch host add test 192.168.1.20
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    先把ssh端口修改成22,等Ceph安装后再将ssh改回原来的端口号(如2002),同时执行如下命令,将端口号加入ssh_config中:

    # ceph cephadm get-ssh-config > ssh_config
    # vi ssh_config
    Host *
      User root
      StrictHostKeyChecking no
      Port                  2002
      UserKnownHostsFile /dev/null
      ConnectTimeout=30
    
    # ceph cephadm set-ssh-config -i ssh_config
    # ceph health detail
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    参考如下文章解决:
    cephadm bootstrap fails with custom ssh port
    来自 https://tracker.ceph.com/issues/48158

    2、忘记初始密码

    #创建密码文件
    cat >/opt/secretkey<<EOF 
    123123
    EOF
    
    ceph dashboard ac-user-set-password admin -i secretkey --force-password
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    参考 https://i4t.com/6075.html

    3、重启systemctl restart ceph.target后报错

    monclient(hunting): authenticate timed out after 300

    systemctl --all | grep mon
    
    • 1

    发现:

    ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service   loaded    failed   failed    Ceph mon.test for def7078a-56ce-11ee-a479-1be9f
    
    • 1

    继续查询日志

    journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    
    • 1

    发现

    error: monitor data filesystem reached concerning levels of available storage space (available: 3% 7.9 GiB
    
    • 1

    清理磁盘空间后

    systemctl reset-failed  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    systemctl restart  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
    ceph -s
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    参考https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/4EJN52JDTGI4D2PSQYHJEK5PZ5RQWM2H/
    更进一步,可以将阈值从默认的5%改到3%

    ceph config set mon mon_data_avail_crit 3
    ceph config show mon.test mon_data_avail_crit
    
    • 1
    • 2

    上面命令中的mon.test要根据实际情况修改

  • 相关阅读:
    php字符串加密,js使用CryptoJS
    JavaScript常用事件详解
    双向带头循环链表的(增删查改)的是实现
    Django项目 - 合并PDF文件
    <5> google translate 不能用的解决办法 亲测 秒成功
    C语言双链表,循环链表,静态链表讲解(王道版)
    Vue3 -- Composition API setup函数 reactive ref ...
    3.下载Swin-Transformer-Object-Detection
    Git 分支管理详解
    分库分表知识内容
  • 原文地址:https://blog.csdn.net/u010438035/article/details/133084403