什么是哨兵?
sentinel是用于监控redis集群中master状态的工具
为什么要有哨兵?
试想一下场景:如果有三台机器做了redis,其中两台是slave节点。一台是master节点,万一master那一天想不开把自己弄挂了,那两台slave节点就傻眼了,slave:我丢~,你挂了我找谁要数据?
官方的来说,哨兵机制就是实现高可用的一种方案
怎么工作的?
为了解决上面的场景,怎么办呢? 这时候两个slave花了一点cpu买端口请来了第三方鉴定机构sentiel入住,干什么呢,主要负责盯着master状态和挂了之后进行选举。那么是不是只要sentinel认为master挂了(主观下线)他就被替代了呢??当然不是为了解决这个问题那当然来个规定,规定几个sentinel认为你挂,那你就挂了,不挂也带挂,规定的数量叫法人数,都认为你挂了叫客观下线。
1):每个Sentinel以每秒钟一次的频率向它所知的Master,Slave以及其他 Sentinel 实例发送一个 PING 命令
2):如果一个实例(instance)距离最后一次有效回复 PING 命令的时间超过 down-after-milliseconds 选项所指定的值, 则这个实例会被 Sentinel 标记为主观下线。
3):如果一个Master被标记为主观下线,则正在监视这个Master的所有 Sentinel 要以每秒一次的频率确认Master的确进入了主观下线状态。
4):当有足够数量的 Sentinel(大于等于配置文件指定的值)在指定的时间范围内确认Master的确进入了主观下线状态, 则Master会被标记为客观下线 。
主观下线:Subjectively Down,简称 SDOWN,指的是当前 一个Sentinel 实例对某个redis服务器做出的下线判断。
客观下线:Objectively Down, 简称 ODOWN,指的是多个 Sentinel 实例在对Master Server做出 SDOWN 判断,并且通过 SENTINEL is-master-down-by-addr 命令互相交流之后,得出的Master Server下线判断,然后开启failover
准备三台虚拟机:
已经安装了并运行了redis和上篇文章的操作
server name | ip |
---|---|
redis-master | 10.8.161.200 |
redis-slave | 10.8.161.203 |
redis-slave | 10.8.161.204 |
修改编写配置文件sentinel.conf (三台机器都要)
sentinel monitor mymaster 10.8.161.200 6379 2 #当集群中有2个sentinel认为master死了时,才能真正认为该master已经不可用了。 (slave上面写的是master的ip,master写自己ip)
sentinel down-after-milliseconds mymaster 3000 #单位毫秒
sentinel failover-timeout mymaster 10000 #若sentinel在该配置值内未能完成failover(故障转移)操作(即故障时master/slave自动切换),则认为本次failover失败。
protected-mode no
启动 哨兵
./src/redis-sentinel sentinel.conf
运行xinxi
[root@master redis]# ./src/redis-sentinel sentinel.conf
65724:X 29 Aug 13:58:04.148 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
65724:X 29 Aug 13:58:04.148 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=65724, just started
65724:X 29 Aug 13:58:04.148 # Configuration loaded
65724:X 29 Aug 13:58:04.148 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 4.0.9 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 65724
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
65724:X 29 Aug 13:58:04.152 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
65724:X 29 Aug 13:58:04.154 # Sentinel ID is 71fe0efc02f89fed719d5e36b780df96a7b5df97
65724:X 29 Aug 13:58:04.154 # +monitor master mymaster 10.8.161.200 6379 quorum 2
65724:X 29 Aug 13:58:04.156 * +slave slave 10.8.161.203:6379 10.8.161.203 6379 @ mymaster 10.8.161.200 6379
65724:X 29 Aug 13:58:04.157 * +slave slave 10.8.161.204:6379 10.8.161.204 6379 @ mymaster 10.8.161.200 6379
65724:X 29 Aug 13:58:06.179 * +sentinel sentinel cced183c335c4f9fcad9ed25cd2abab40e9d4311 10.8.161.203 26379 @ mymaster 10.8.161.200 6379
65724:X 29 Aug 13:58:06.214 * +sentinel sentinel 3918f599d0e50472d32ff227807e762e0650eac4 10.8.161.204 26379 @ mymaster 10.8.161.200 6379
在master主机上关掉 master
[root@master redis]# ps -ef | grep redis
root 11209 1 0 8月27 ? 00:03:46 ./src/redis-server 0.0.0.0:6379
root 65775 54000 0 13:59 pts/1 00:00:00 ./src/redis-sentinel *:26379 [sentinel]
root 65844 54000 0 14:00 pts/1 00:00:00 grep --color=auto redis
[root@master redis]# kill -9 11209
[root@master redis]# 65775:X 29 Aug 14:07:32.115 # +sdown master mymaster 10.8.161.200 6379
65775:X 29 Aug 14:07:32.182 # +new-epoch 1
65775:X 29 Aug 14:07:32.183 # +vote-for-leader cced183c335c4f9fcad9ed25cd2abab40e9d4311 1
65775:X 29 Aug 14:07:32.218 # +odown master mymaster 10.8.161.200 6379 #quorum 3/2
65775:X 29 Aug 14:07:32.219 # Next failover delay: I will not start a failover before Mon Aug 29 14:07:52 2022
65775:X 29 Aug 14:07:33.257 # +config-update-from sentinel cced183c335c4f9fcad9ed25cd2abab40e9d4311 10.8.161.203 26379 @ mymaster 10.8.161.200 6379
65775:X 29 Aug 14:07:33.258 # +switch-master mymaster 10.8.161.200 6379 10.8.161.203 6379
65775:X 29 Aug 14:07:33.259 * +slave slave 10.8.161.204:6379 10.8.161.204 6379 @ mymaster 10.8.161.203 6379
65775:X 29 Aug 14:07:33.259 * +slave slave 10.8.161.200:6379 10.8.161.200 6379 @ mymaster 10.8.161.203 6379
65775:X 29 Aug 14:07:36.269 # +sdown slave 10.8.161.200:6379 10.8.161.200 6379 @ mymaster 10.8.161.203 6379
在slave查看状态是否切换
[root@slave1 redis]# ./src/redis-cli
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.8.161.204,port=6379,state=online,offset=688081,lag=1
master_replid:1d2d49c518d5efdd2fa63aee80fe9e6cabb48aa6
master_replid2:8b96d56ada4bec8d3bea72af7cc1cbb0f977ecf9
master_repl_offset:688081
second_repl_offset:318496
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:688081
127.0.0.1:6379>
[root@slave2 redis]# ./src/redis-cli
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.8.161.203
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:695226
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:1d2d49c518d5efdd2fa63aee80fe9e6cabb48aa6
master_replid2:8b96d56ada4bec8d3bea72af7cc1cbb0f977ecf9
master_repl_offset:695226
second_repl_offset:318496
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:29
repl_backlog_histlen:695198
127.0.0.1:6379>
到这里哨兵就简单的完成了。。。