• KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析


    案例说明:
    KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的流程去修改状态。 那么你需要确认一点,到底是主库还是备库的cluster在执行脚本,关键字:failover_stream.sh ,看cluster.log中有没有这个的名称。本案例通过主库系统宕机,分析了failover 切换过程。

    适用版本: KingbaseES V8R3

    集群节点状态信息:

    1. # 集群节点状态信息
    2. TEST=# show pool_nodes;
    3. node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
    4. ---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
    5. 0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 1 | true | 0
    6. 1 | 192.168.1.102 | 54321 | up | 0.500000 | standby | 0 | false | 0
    7. (2 rows)

    对于KingbaseES V8R3集群failover切换的分析:

    1. 1. 可以通过主备库的cluster.log,获取整个主备切换的过程。
    2. 2. 从failover.log日志中获取执行failover过程,注意只有执行failover_stream.sh的节点才会产生failover.log
    3. 3. recovery.log日志记录了原主库failover完成后,从主库recovery为新备库的过程。

    一、Failover切换过程分析

    关闭主库系统:[root@node101 ~]# shutdown -h now

    备库cluster.log日志分析:(主库系统关机,备库调用failover_stream.sh执行failover切换)

    1、主备库处于正常状态时,备库收到主库的heartbeat。

    1. 2022-08-30 15:38:03: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 0 seconds ago
    2. 2022-08-30 15:38:06: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    3. 2022-08-30 15:38:06: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 0 seconds ago

    2、备库kingbasecluster检测到主库集群服务和数据库服务shutdown,kingbasecluster发生主备切换,当前的kingbasecluster由standby切换为master。

    1. 2022-08-30 15:38:09: pid 17585: LOG: remote node "192.168.1.101:9999 Linux node101" is shutting down
    2. 2022-08-30 15:38:09: pid 17585: LOG: watchdog cluster has lost the coordinator node
    3. 2022-08-30 15:38:09: pid 17585: LOG: unassigning the remote node "192.168.1.101:9999 Linux node101" from watchdog cluster master
    4. 2022-08-30 15:38:09: pid 17585: LOG: We have lost the cluster master node "192.168.1.101:9999 Linux node101"
    5. 2022-08-30 15:38:09: pid 17585: LOG: watchdog node state changed from [STANDBY] to [JOINING]
    6. 2022-08-30 15:38:09: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    7. 2022-08-30 15:38:09: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 1 seconds ago
    8. 2022-08-30 15:38:10: pid 17547: ERROR: failed to authenticate
    9. 2022-08-30 15:38:10: pid 17547: DETAIL: the database system is shutting down
    10. 2022-08-30 15:38:12: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    11. 2022-08-30 15:38:12: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 4 seconds ago

    3、备库kingbasecluster对主库的数据库服务根据:health_check_max_retries * health_check_retry_delay配置尝试连接。

    1. 2022-08-30 15:38:13: pid 17547: LOG: health checking retry count 1
    2. 2022-08-30 15:38:13: pid 17585: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
    3. 2022-08-30 15:38:14: pid 17585: LOG: I am the only alive node in the watchdog cluster
    4. 2022-08-30 15:38:14: pid 17585: HINT: skipping stand for coordinator state
    5. 2022-08-30 15:38:14: pid 17585: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
    6. 2022-08-30 15:38:14: pid 17585: LOG: I am announcing my self as master/coordinator watchdog node
    7. 2022-08-30 15:38:15: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    8. 2022-08-30 15:38:15: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 7 seconds ago
    9. 2022-08-30 15:38:18: pid 17585: LOG: I am the cluster leader node
    10. 2022-08-30 15:38:18: pid 17585: DETAIL: our declare coordinator message is accepted by all nodes
    11. 2022-08-30 15:38:18: pid 17585: LOG: setting the local node "192.168.1.102:9999 Linux node102" as watchdog cluster master

    4、备库开启升级进程(escalation process),cluster vip漂移到新的master节点。

    1. 2022-08-30 15:38:18: pid 17585: LOG: I am the cluster leader node. Starting escalation process
    2. 2022-08-30 15:38:18: pid 17585: LOG: escalation process started with PID:22088
    3. 2022-08-30 15:38:18: pid 17547: LOG: connect_inet_domain_socket: select() interrupted by certain signal. retrying...
    4. 2022-08-30 15:38:18: pid 22088: LOG: watchdog: escalation started
    5. 2022-08-30 15:38:18: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    6. 2022-08-30 15:38:18: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 10 seconds ago
    7. 2022-08-30 15:38:21: pid 22088: LOG: successfully acquired the delegate IP:"192.168.1.201"
    8. 2022-08-30 15:38:21: pid 22088: DETAIL: 'if_up_cmd' returned with success
    9. .......

    5、 备库对原主库数据库服务的连接次数超过阈值后,将原主库的数据库服务设置为down状态。

    1. 2022-08-30 15:39:20: pid 17547: LOG: health checking retry count 10
    2. 2022-08-30 15:39:22: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    3. 2022-08-30 15:39:22: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 73 seconds ago
    4. 2022-08-30 15:39:23: pid 17547: LOG: failed to connect to kingbase server on "192.168.1.101:54321", getsockopt() detected error "
    5. No route to host"
    6. 2022-08-30 15:39:23: pid 17547: ERROR: failed to make persistent db connection
    7. 2022-08-30 15:39:23: pid 17547: DETAIL: connection to host:"192.168.1.101:54321" failed
    8. 2022-08-30 15:39:23: pid 17547: LOG: setting backend node 0 status to NODE DOWN

    6、 收到本节点kingbasecluster发出的failover request,并获取到failover lock。

    1. 2022-08-30 15:39:23: pid 17547: LOG: received degenerate backend request for node_id: 0 from pid [17547]
    2. 2022-08-30 15:39:23: pid 17585: LOG: new IPC connection received
    3. 2022-08-30 15:39:23: pid 17585: LOG: watchdog received the failover command from local kingbasecluster on IPC interface
    4. 2022-08-30 15:39:23: pid 17585: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local
    5. kingbasecluster on IPC interface
    6. 2022-08-30 15:39:23: pid 17585: LOG: I am the only kingbasecluster node in the watchdog cluster
    7. 2022-08-30 15:39:23: pid 17585: DETAIL: no need to propagate the failover command [DEGENERATE_BACKEND_REQUEST]
    8. 2022-08-30 15:39:23: pid 17585: LOG: new IPC connection received
    9. 2022-08-30 15:39:23: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
    10. 2022-08-30 15:39:23: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to become a lock
    11. holder for failover ID: 0

    7、 获取到failover lock后,开始执行failover_stream.sh进行failover的切换。

    1. 2022-08-30 15:39:23: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is the lock holder
    2. 2022-08-30 15:39:23: pid 17547: LOG: starting degeneration. shutdown host 192.168.1.101(54321)
    3. 2022-08-30 15:39:23: pid 17547: LOG: Restart all children
    4. 2022-08-30 15:39:23: pid 17547: LOG: execute command: /home/kingbase/cluster/R3HA/kingbasecluster/bin/failover_stream.sh 192.168.
    5. 1.102 0 0 192.168.1.101 192.168.1.101 1 0 /home/kingbase/cluster/R3HA/db/data
    6. 2022-08-30 15:39:25: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    7. 2022-08-30 15:39:25: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 76 seconds ago
    8. ssh: connect to host 192.168.1.101 port 22: No route to host
    9. 2022-08-30 15:39:31: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
    10. 2022-08-30 15:39:31: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 82 seconds ago
    11. 2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
    12. 2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
    13. 2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FAIL
    14. OVER] lock for failover ID 0

    8、执行failover切换,并查找新的primary。(当前备库切换为primary)

    1. 2022-08-30 15:39:33: pid 17547: LOG: find_primary_node_repeatedly: waiting for finding a primary node
    2. 2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: checking backend no 0
    3. 2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: checking backend no 1
    4. 2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: primary node id is 1
    5. 2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
    6. 2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
    7. 2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FOLL
    8. OW MASTER] lock for failover ID 0
    9. 2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" has released the [FOLLOW MASTE
    10. R] lock for failover ID 0
    11. 2022-08-30 15:39:33: pid 17547: LOG: failover: set new primary node: 1
    12. 2022-08-30 15:39:33: pid 17547: LOG: failover: set new master node: 1
    13. 2022-08-30 15:39:33: pid 21501: LOG: worker process received restart request
    14. 2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
    15. 2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
    16. 2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to resign from a
    17. lock holder for failover ID 0
    18. 2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" has resigned from the lock hol
    19. der
    20. failover done. shutdown host 192.168.1.101(54321)2022-08-30 15:39:33: pid 17547: LOG: failover done. shutdown host 192.168.1.101(
    21. 54321)

    备库Failover.log分析:

    1、执行failover_stream.sh进行failover切换

    1. -----------------2022-08-30 15:39:23 failover beging---------------------------------------
    2. ----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/R3HA/db/data].
    3. Tips:
    4. 1)new_master : 192.168.1.102
    5. 2) new_master_id: 1
    6. 3) old_primary: 192.168.1.101
    7. 4) old_primary_id : 0

    2、ping 网关测试网络连通性

    1. ----ping trust ip
    2. ping trust ip 192.168.1.1 success

    3、ssh连接远程原主库,执行关闭数据库服务和释放db vip;并将db vip加载到当前主库节点。

    1. ----determine whether the faulty db is master or standby
    2. master down, let 192.168.1.102 become new primary.....
    3. 2022-08-30 15:39:25 del old primary VIP on 192.168.1.101
    4. ssh connect host:192.168.1.101 failed, can not stop old primary db and del old primary db vip
    5. 2022-08-30 15:39:29 add VIP on 192.168.1.102
    6. ADD VIP NOW AT 2022-08-30 15:39:30 ON enp0s3
    7. execute: [/sbin/ip addr add 192.168.1.200/24 dev enp0s3 label enp0s3:2]
    8. execute: /home/kingbase/cluster/R3HA/db/bin/arping -U 192.168.1.200 -I enp0s3 -w 1
    9. ARPING 192.168.1.200 from 192.168.1.200 enp0s3
    10. Sent 1 probes (1 broadcast(s))
    11. Received 0 response(s)
    12. 2022-08-30 15:39:31 promote begin...let 192.168.1.102 become master

    4、通过连接db vip执行“select 33333”测试当前数据库服务状态,如果正常,执行promote,提升为新主库,将同步模式设为async。

    1. check db if is alive
    2. ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
    3. 2022-08-30 15:39:31 kingbase is ok , to prepare execute promote
    4. execute promote
    5. server promoting
    6. check db if is alive after promote
    7. ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
    8. 2022-08-30 15:39:32 after execute promote , kingbase status is ok.
    9. after execute promote, kingbase is ok.
    10. 2022-08-30 15:39:32 sync to async
    11. ALTER SYSTEM
    12. SYS_RELOAD_CONF
    13. -----------------
    14. t
    15. (1 row)

    5、连接新主库数据库服务,如果成功,在新主库上执行checkpoint,checkpoint成功,则failover切换完成。

    1. 2022-08-30 15:39:32 make checkpoint
    2. check the db to see if it is alive
    3. ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
    4. 2022-08-30 15:39:32 kingbase is ok , to prepare execute checkpoint
    5. execute checkpoint
    6. CHECKPOINT
    7. check the db to see if it is alive after execute checkpoint
    8. ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
    9. 2022-08-30 15:39:33 after execute checkpoint, kingbase is ok.
    10. after execute checkpoint, kingbase is ok.
    11. -----------------2022-08-30 15:39:33 failover end---------------------------------------

    二、原主库recovery为新备库

    1、在原主库生成recovery.conf文件(系统通过crond计划任务,调用network_rewind.sh脚本自动recovery为备库)[kingbase@node101 data]$ cp ../etc/recovery.done ./recovery.conf

    2、查看recovery.log日志信息

    1. [kingbase@node101 log]$ more recovery.log
    2. ---------------------------------------------------------------------

    1)开启recovery,检测存储读写功能是否正常。

    1. 2022-08-30 16:42:01 recover beging...
    2. my pid is 11927,officially began to perform recovery
    3. 2022-08-30 16:42:01 check read/write on mount point
    4. 2022-08-30 16:42:01 check read/write on mount point (1 / 6).
    5. 2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ...
    6. 2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ... OK
    7. 2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
    8. 2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
    9. 2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
    10. 2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
    11. 2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
    12. 2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
    13. 2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
    14. 2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
    15. 2022-08-30 16:42:01 success to check read/write on mount point (1 / 6).
    16. 2022-08-30 16:42:01 check read/write on mount point ... ok

    2)ping 网关检测网络的连通性

    1. 2022-08-30 16:42:01 check if the network is ok
    2. ping trust ip 192.168.1.1 success ping times :[3], success times:[2]

    3)判断当前节点role(master OR standby),如果是standby节点,并判断是否需要recovery。( I,m status is [1] (1 is down), I will be in recovery.)

    1. determine if i am master or standby
    2. node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
    3. ---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
    4. 0 | 192.168.1.101 | 54321 | down | 0.500000 | standby | 0 | false | 0
    5. 1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
    6. (2 rows)
    7. i am standby in cluster,determine if recovery is needed
    8. 2022-08-30 16:42:03 now will del vip [192.168.1.200/24]
    9. but no 192.168.1.200/24 on my DEV, nothing to do with del
    10. ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[1/10]
    11. ......
    12. ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[10/10]
    13. primary node/Im node status is changed, primary ip[192.168.1.102], recovery.conf NEED_CHANGE [0] (0 is need ), I,m status is [1] (1 is down), I will be in recovery.
    14. node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
    15. ---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
    16. 0 | 192.168.1.101 | 54321 | down | 0.500000 | standby | 0 | false | 0
    17. 1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
    18. (2 rows)

    4)如果需要做recovery,执行sys_rewind。

    1. if recover node up, let it down , for rewind
    2. 2022-08-30 16:42:34 sys_rewind...
    3. sys_rewind --target-data=/home/kingbase/cluster/R3HA/db/data --source-server="host=192.168.1.102 port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST"
    4. datadir_source = /home/kingbase/cluster/R3HA/db/data
    5. rewinding from last common checkpoint at 0/66000028 on timeline 15
    6. find last common checkpoint start time from 2022-08-30 16:42:34.727711 CST to 2022-08-30 16:42:34.839545 CST, in "0.111834" seconds.
    7. reading source file list
    8. reading target file list
    9. reading WAL in target
    10. Rewind datadir file from source
    11. update the control file: minRecoveryPoint is '0/68024E80', minRecoveryPointTLI is '16', and database state is 'in archive recovery'
    12. rewind start wal location 0/66000028 (file 0000000F0000000000000066), end wal location 0/68024E80 (file 000000100000000000000068). time from 2022-08-30 16:42:36.727711 CST to 2022-08-30 16:42:37.201678 CST, in "2.473967" seconds.
    13. Done!

    5)sys_rewind成功,修改集群配置文件。

    1. sed conf change #synchronous_standby_names
    2. 2022-08-30 16:42:39 file operate
    3. cp recovery.conf...
    4. change recovery.conf ip -> primary.ip
    5. 2022-08-30 16:42:39 no need change recovery.conf, primary node is 192.168.1.102

    6)启动新备库数据库服务,删除原复制槽并重建。备库数据库服务启动成功后,配置同步为sync。

    1. delete pid file if exist
    2. del the replication_slots if exis
    3. drop the slot [slot_node101].
    4. drop the slot [slot_node102].
    5. 2022-08-30 16:42:39 start up the kingbase...
    6. waiting for server to start....LOG: redirecting log output to logging collector process
    7. HINT: Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log".
    8. done
    9. server started
    10. ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
    11. SYS_CREATE_PHYSICAL_REPLICATION_SLOT
    12. --------------------------------------
    13. (slot_node101,)
    14. (1 row)
    15. 2022-08-30 16:42:41 create the slot [slot_node101] success.
    16. SYS_CREATE_PHYSICAL_REPLICATION_SLOT
    17. --------------------------------------
    18. (slot_node102,)
    19. (1 row)
    20. 2022-08-30 16:42:41 create the slot [slot_node102] success.
    21. 2022-08-30 16:42:41 start up standby successful!
    22. cluster is sync cluster.
    23. SYNC RECOVER MODE ...
    24. 2022-08-30 16:42:41 remote primary node change sync
    25. ALTER SYSTEM
    26. SYS_RELOAD_CONF
    27. -----------------
    28. t
    29. (1 row)
    30. SYNC RECOVER MODE DONE

    7)执行pcp_attach_node将当前节点加入到集群。

    1. 2022-08-30 16:42:44 attach pool...
    2. IM Node is 0, will try [pcp_attach_node -U kingbase -W MTIzNDU2 -h 192.168.1.201 -n 0]
    3. pcp_attach_node -- Command Successful
    4. node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
    5. ---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
    6. 0 | 192.168.1.101 | 54321 | up | 0.500000 | standby | 0 | false | 0
    7. 1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
    8. (2 rows)
    9. 2022-08-30 16:42:45 attach end..
    10. recovery success,exit script with success

    三、总结
    通过对原主库执行系统重启服务,如果备库在阈值范围内尝试连接主库数据库服务失败,其failover切换过程和主库系统shutdown是一致的。

  • 相关阅读:
    数据分析实战应用案例精讲-【概念篇】用户画像(最终篇)(附实战案例)
    ijkplayer iOS编译问题之[-Wincompatible-function-pointer-types]
    MySQL 啥时候用表锁,啥时候用行锁?
    Redis强化(缓存原则,缓存淘汰,缓存穿透,缓存击穿,缓存雪崩,Redis持久化)
    ASP.NETWeb开发(C#版)-day1-C#基础+实操
    RNN笔记(刘二大人)
    【机器学习】python机器学习使用scikit-learn对模型进行微调:按特征贡献大小保留最重要k个特征的transform
    sylar服务器框架分析——日志系统
    [蓝牙 Mesh & Zephyr]-[005]-Key
    android代码中打开系统设置界面(无线和网络设置,日期和时间设置, 辅助功能设置,电池信息)
  • 原文地址:https://blog.csdn.net/lyu1026/article/details/126677724