第六课 ceph基础学习-Ceph的管理监控和故障排查

文章目录

第六课 ceph基础学习-Ceph的管理监控和故障排查

第一节 dashboard安装和使用

yum isntall ceph-mgr-dashboard
# 启动dashboard
ceph mgr module enable dashboard 
ceph mgr module enable dashboard --force 
# 看下启动状态
ceph mgr module ls |less
# 使用内部认证证书
ceph dashboard create-self-signed-cert
# 主机名和端口配置
ceph config set mgr mgr/dashboard/server_addr 192.168.44.139
ceph config set mgr mgr/dashboard/server_port 8080
ceph config set mgr mgr/dashboard/ssl_server_port 8443
# 查看配置
ceph config ls
ceph config ls mgr/dashboard/server addr
# 查看服务
ceph mgr services
# 启动用户设置角色
ceph dashboard ac-user-create cephadmin cephpassword administrator
# 登陆一下
192.168.44.139:8443
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

第二节 Manager插件的介绍

ceph manager daemon
- Dashboard module 面板
- Alerts module告警
- zabbix module集成zabbix
- Prometheus module 集成Prometheus
- Influx module采集数据到influx
- iostat module想iostat命令一样监控
- Crash module 上报crush dump可以分析
- Insights module 巡检健康报告和crush报告

# 启动模块zabbix
ceph mgr module enable zabbix
# 启动模块zabbix
ceph mgr module enable zabbix
# zabbix采集配置查看 根据官网进行配置
ceph zabbix config-show
1
2
3
4
5
6

第三节 Prometheus模块监控

ceph的promethues模块开启

ceph mgr module enable prometheus
# 查看模块启动 
ceph mgr module ls |less
# 默认打开9283的监听端口 页面打开后可以看到一系列指标
netstat -ntlp | grep 9283
1
2
3
4
5

promethues和granfa安装。配置模板展示就好了。

第四节 SDK开发借鉴

Ceph Storage Cluster下的API存储级别的API：https://docs.ceph.com/en/quincy/rados/api/
Ceph RDB API: https://docs.ceph.com/en/quincy/rbd/api/librbdpy/#module-rbd
S3对象存储的API: https://docs.ceph.com/en/quincy/radosgw/s3/
swift相关API: https://docs.ceph.com/en/quincy/radosgw/swift/

第五节常见故障分析

时钟偏移告警。

ceph -s
# clock skew detected on mon.node-2, mon. node-3
ceph health detail

# 查看配置超过多少秒报出警告
ceph --admin-daemon /var/run/ceph/ceph-mon.node-1.asok config show |grep clock

# 修改时间
systemctl stop ntpd
ntpdate 自己服务器ip
hwclock -w # 写到硬时钟里 重启不变
systemctl start ntpd
1
2
3
4
5
6
7
8
9
10
11
12

服务crash归档排查归档

ceph -s
# Long heartbeat ping times on back interface seen，longest is 9463.023 msec
#Long heartbeat ping times on front interface seen，longest is 9449.712 
# msec3 daemons have recently crashed
ceph health detail # 详细告警
# 查看详细告警中的时间的日志 用vim / 过滤
vim /var/log/ceph/ceph-client.rgw.node-1.log
# 虽然现在服务正常 告警还是存在
ceph -h | grep  crash
ceph crash ls
ceph crash info + crash的id 
# 删除或归档crash信息
ceph crash archive + crash的id 
ceph crash rm + crash的id 
# 全部打包
ceph crash archive-all
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

mon空间不足排障

ceph -s
# mons node-1,node-2 are low on available space
ceph health detail
# 查看告警配置 达到多少开始报警
ceph daemon mon.node-1 config show |grep mon |grep data
# 扩展磁盘解决
1
2
3
4
5
6

ceph集群阻塞排查

ceph -s # 一直卡住
# 查看日志 搜索error
tail -f /var/log/ceph/ceph-mon.node-1.log
1
2
3

相关阅读:
使用Maven 构建、开发和打包 JavaFX 项目
ubuntu 20及之后版本添加开机自启服务
物联网中的MQTT协议总结
[LeetCode]-160. 相交链表-141. 环形链表-142.环形链表II-138.随机链表的复制
Java 线程池异步任务
判断前端和后端bug，赶紧来扫个盲！
程序员这个职业会在10年内被AI淘汰吗？
GBase 8c获取结果集中的数据
【Harmony OS】【JAVA UI】abilitySlice和ability跳转方式
记一次gitlab平台任意用户注册引发的源代码泄漏

原文地址：https://blog.csdn.net/aa18855953229/article/details/127399801

第六课 ceph基础学习-Ceph的管理监控和故障排查

第六课 ceph基础学习-Ceph的管理监控和故障排查

文章目录

第一节 dashboard安装和使用

第二节 Manager插件的介绍

第三节 Prometheus模块监控

第四节 SDK开发借鉴

第五节 常见故障分析

第五节常见故障分析