• Redis高可用实战之Cluster


    Redis高可用系列

    概念

    上一篇《Redis高可用实战之Sentinel》介绍了如何通过 Sentinel 实现 Redis 的高可用,在此基础上使用Redis Cluster将数据分割到多个节点上,进一步提高 Redis 的并发能力和性能。整个集群的部分节点失败或者不可达的情况下能够继续处理命令。

    原理

    Redis Cluster 使用CRC16算法计算键的哈希值,再对16384进行取模得到该键的hash slot。Redis集群共有16384个 Hash Slot,这些 Hash Slot 可以分布在集群中的多个节点上。

    增加或删除集群节点时,将一部分 Hash Slot 移动到新的节点上,或从该节点移除,这样便于对集群进行扩展。

    在 Redis Cluster 中每个 Hash Slot 可拥有多个副本,当每个 master 都存在副本时,其中一个 master 故障后,它的副本将被提升为新的 master,这样整个集群仍然可用。

    环境说明

    节点IP端口
    主节点A192.16.1.2017000
    副本A192.16.1.2017001
    主节点B192.16.1.2027002
    副本B192.16.1.2027003
    主节点C192.16.1.2037004
    副本C192.16.1.2037005

    修改配置

    复制配置文件,并添加端口进行区分。

    cp redis.conf redis-7000.conf
    
    • 1

    修改配置以启用集群模式,并指定集群配置文件路径:

    port 7000
    cluster-enabled yes
    cluster-config-file nodes-7000.conf
    cluster-node-timeout 5000
    
    • 1
    • 2
    • 3
    • 4

    其他端口配置文件类似。

    创建集群

    启动实例

    首先启动各节点 Redis 实例:

    redis-server redis-7000.conf 
    
    • 1

    实例将以集群模式启动,在日志中可以看到实例在集群中的ID:

    20309:M 10 Aug 2022 18:45:13.340 * No cluster configuration found, I'm 6b6c5653321d7bfaad1266b395f7647eac6aa4e8
    
    • 1

    创建集群

    通过命令创建集群:

    redis-cli --cluster create \
            192.168.1.201:7000 192.168.1.201:7001 \
            192.168.1.202:7002 192.168.1.202:7003 \
            192.168.1.203:7004 192.168.1.203:7005 \
            --cluster-replicas 1
    
    • 1
    • 2
    • 3
    • 4
    • 5

    创建时会对 Hash Slots 进行分配,当看到该提示时表示集群创建成功。

    [OK] All 16384 slots covered.
    
    • 1

    创建完成后,查看集群信息:

    [root@node1 ~]# redis-cli --cluster info 192.168.1.201:7000
    192.168.1.203:7005 (b43dbc87...) -> 0 keys | 16384 slots | 5 slaves.
    [OK] 0 keys in 1 masters.
    0.00 keys per slot on average.
    
    • 1
    • 2
    • 3
    • 4

    如果提示以下错误,表示该实例的0号库中存在数据。

    [ERR] Node 192.168.1.201:7000 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
    
    • 1

    此时需要清空后重新创建。

    redis-cli -h 192.168.1.201 -p 7000 -n 0 flushdb
    
    • 1

    如果提示以下错误,表示 slots 分布不正确。

    [ERR] Not all 16384 slots are covered by nodes.
    
    • 1

    需要对 slots 的分布进行修复操作:

    redis-cli --cluster fix 192.168.1.201:7000
    
    • 1

    集群重新分片

    如果集群中 slots 的分布不是我们预期的状态,比如以下情况:

    [root@node1 ~]# redis-cli --cluster check 192.168.1.201:7000
    192.168.1.203:7005 (b43dbc87...) -> 0 keys | 16384 slots | 5 slaves.
    [OK] 0 keys in 1 masters.
    0.00 keys per slot on average.
    >>> Performing Cluster Check (using node 192.168.1.201:7000)
    S: 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    S: d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    S: 477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    S: c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    M: b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005
       slots:[0-16383] (16384 slots) master
       5 additional replica(s)
    S: 5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    当前集群有6个节点,但只有1个master,而我们期望的是3个 master,每个 master有1个副本,此时需要调整 slots 的分布。

    如果希望将192.168.1.201:7000提升为master,首先将它从集群中剔除掉:

    [root@node1 ~]# redis-cli --cluster del-node 192.168.1.201:7000 6b6c5653321d7bfaad1266b395f7647eac6aa4e8           
    >>> Removing node 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 from cluster 192.168.1.201:7000
    >>> Sending CLUSTER FORGET messages to the cluster...
    >>> Sending CLUSTER RESET SOFT to the deleted node.
    
    • 1
    • 2
    • 3
    • 4

    再重新加入集群:

    [root@node1 ~]# redis-cli --cluster add-node 192.168.1.201:7000 192.168.1.202:7002
    
    • 1

    将5000个 slots 重新分配到192.168.1.201:7000节点上:

    redis-cli --cluster reshard 192.168.1.201:7000 \
              --cluster-from b43dbc87504861f09efcfbaa4df6394bd5eca1e5 \
              --cluster-to 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 \
              --cluster-slots 5000 \
              --cluster-yes
    
    • 1
    • 2
    • 3
    • 4
    • 5

    重新分配后,确保每个 master 都有 slots 。

    [root@node1 ~]# redis-cli --cluster check 192.168.1.201:7000
    192.168.1.201:7000 (6b6c5653...) -> 0 keys | 5000 slots | 1 slaves.
    192.168.1.202:7002 (c4c1ce62...) -> 0 keys | 5000 slots | 1 slaves.
    192.168.1.203:7005 (b43dbc87...) -> 0 keys | 6384 slots | 1 slaves.
    [OK] 0 keys in 3 masters.
    0.00 keys per slot on average.
    >>> Performing Cluster Check (using node 192.168.1.201:7000)
    M: 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000
       slots:[0-4999] (5000 slots) master
       1 additional replica(s)
    S: d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003
       slots: (0 slots) slave
       replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
    S: 477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001
       slots: (0 slots) slave
       replicates 6b6c5653321d7bfaad1266b395f7647eac6aa4e8
    M: c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002
       slots:[5000-9999] (5000 slots) master
       1 additional replica(s)
    M: b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005
       slots:[10000-16383] (6384 slots) master
       1 additional replica(s)
    S: 5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004
       slots: (0 slots) slave
       replicates c4c1ce621d4739888080075348bc3ae1c5ef8841
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    测试验证

    写入数据

    [root@node1 ~]# redis-cli -c -h 192.168.1.201 -p 7000
    192.168.1.201:7000> set hello world
    OK
    
    • 1
    • 2
    • 3

    可能将数据保存在其他节点的 slot 上,比如:

    192.168.1.201:7000> set test redis
    -> Redirected to slot [6918] located at 192.168.1.202:7002
    OK
    
    • 1
    • 2
    • 3

    获取数据

    192.168.1.202:7002> get hello
    -> Redirected to slot [866] located at 192.168.1.201:7000
    "world"
    
    • 1
    • 2
    • 3

    验证可用性
    关闭集群中的一个 master 实例,集群开始故障转移,它的副本将被自动提升为新的 master ,整个集群仍然可用。

    通过副本的日志信息可以查看该过程:

    20493:S 10 Aug 2022 19:24:29.390 * MASTER <-> REPLICA sync started
    20493:S 10 Aug 2022 19:24:29.390 # Error condition on socket for SYNC: Connection refused
    20493:S 10 Aug 2022 19:24:29.618 * FAIL message received from 5635d84f4ad0ca122d8ef657b6716258b11977bb about 6b6c5653321d7bfaad1266b395f7647eac6aa4e8
    20493:S 10 Aug 2022 19:24:29.618 # Cluster state changed: fail
    20493:S 10 Aug 2022 19:24:29.696 # Start of election delayed for 981 milliseconds (rank #0, offset 754967).
    20493:S 10 Aug 2022 19:24:30.414 * Connecting to MASTER 192.168.1.201:7000
    20493:S 10 Aug 2022 19:24:30.414 * MASTER <-> REPLICA sync started
    20493:S 10 Aug 2022 19:24:30.414 # Error condition on socket for SYNC: Connection refused
    20493:S 10 Aug 2022 19:24:30.720 # Starting a failover election for epoch 9.
    20493:S 10 Aug 2022 19:24:30.730 # Failover election won: I'm the new master.
    20493:S 10 Aug 2022 19:24:30.730 # configEpoch set to 9 after successful failover
    20493:M 10 Aug 2022 19:24:30.730 * Discarding previously cached master state.
    20493:M 10 Aug 2022 19:24:30.730 # Setting secondary replication ID to b3c89f90f221747b4fe4e60073bc4c7b432fb5dd, valid up to offset: 754968. New replication ID is 58143b1e717a087933879fcd0c0d47d76903a761
    20493:M 10 Aug 2022 19:24:30.733 # Cluster state changed: ok
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    故障节点恢复后,它将作为 master 的副本加入集群:

    192.168.1.201:7001> CLUSTER nodes
    c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002@17002 master - 0 1660130971000 8 connected 5000-9999
    6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000@17000 slave 477703acb78e9fd5b2e71e39e9de96e39b67ced5 0 1660130970585 9 connected
    d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003@17003 slave b43dbc87504861f09efcfbaa4df6394bd5eca1e5 0 1660130971506 6 connected
    477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001@17001 myself,master - 0 1660130970000 9 connected 0-4999
    b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005@17005 master - 0 1660130971000 6 connected 10000-16383
    5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004@17004 slave c4c1ce621d4739888080075348bc3ae1c5ef8841 0 1660130971711 8 connected
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    参考资料

    官方文档

  • 相关阅读:
    Windows网络与通信程序设计实验一:基于TCP的C/S通信仿真
    Java入门必备知识你能掌握多少?
    kubeadm搭建k8s集群1.25版本完整教程【doker、网络插件calico、中间层cri-docker】
    sleep() 方法和 wait() 方法对比
    docker容器技术实战-3
    MySQL中的分组函数和分组查询
    [操作系统笔记]基本分段存储管理
    Boosting Few-Shot Visual Learning with Self-Supervision(代码理解)
    Vue学习之--------插槽【默认插槽、具名插槽、作用域插槽】(2022/8/30)
    《最新出炉》系列入门篇-Python+Playwright自动化测试-46-鼠标滚轮操作
  • 原文地址:https://blog.csdn.net/ldjjbzh626/article/details/126271857