• KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例


    案例说明:
    生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考。

    适用版本: KingbaseES V8R6

    本案例数据库版本(单实例和集群使用相同的版本):

    1. test=# select version();
    2. version
    3. ----------------------------------------------------------------------------------------------------------------------
    4. KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
    5. (1 row)

    原集群节点信息:

    1. [kingbase@node101 bin]$ ./repmgr cluster show
    2. ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
    3. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
    4. 1 | node101 | primary | * running | | default | 100 | 13 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
    5. 2 | node102 | standby | running | node101 | default | 100 | 13 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

    一、单实例环境迁移前主备

    Tips:
    1)将单实例环境数据迁移到集群,需要停止单实例数据库服务,根据data目录数据的大小, 要估算停机窗口时间。
    2)在关闭单实例数据库前,建议手工创建检查点,如果wal日志比较大,建议备份后,清理wal日志,只需要保留最近一天的日志到最近检查点后即可。
    3)需要跨主机将单实例data目录拷贝到集群的主备库节点,需 根据网络带宽和节点数,估算整个拷贝时间。

    1、查看数据信息

    1. prod1=# \l
    2. List of databases
    3. Name | Owner | Encoding | Collate | Ctype | Access privileges
    4. -----------+--------+----------+----------+-------------+-------------------
    5. prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    6. prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    7. security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    8. template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
    9. | | | | | system=CTc/system
    10. template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
    11. | | | | | system=CTc/system
    12. test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    13. (6 rows)
    14. prod1=# select count(*) from t1;
    15. count
    16. --------
    17. 100000
    18. (1 row)

    2、关闭数据库服务

    1. 1)手工建立checkpoint
    2. prod1=# select sys_switch_wal();
    3. sys_switch_wal
    4. ----------------
    5. 0/3F74BD70
    6. (1 row)
    7. prod1=# checkpoint;
    8. CHECKPOINT

    2)正常关闭数据库服务

    1. [kingbase@node101 bin]$ ./sys_ctl stop -D /data/kingbase/v8r6_041/data
    2. waiting for server to shut down.... done
    3. server stopped

    二、迁移数据到集群

    1、停止集群

    1. [kingbase@node101 bin]$ ./sys_monitor.sh stop
    2. 2022-06-20 10:33:40 Ready to stop all DB ...
    3. .....
    4. 2022-06-20 10:33:55 begin to stop DB on "[192.168.1.102]".
    5. 2022-06-20 10:33:57 DB on "[192.168.1.101]" stop success.
    6. 2022-06-20 10:33:57 Done.

    2、将集群的data目录备份

    1. [kingbase@node101 bin]$ cd ../
    2. [kingbase@node101 kingbase]$ mv data data.bk

    3、将单实例的data目录拷贝到集群的主备节点

    1. [kingbase@node101 v8r6_041]$ scp -r data node101:/home/kingbase/cluster/R6HA/kha/kingbase
    2. [kingbase@node101 v8r6_041]$ scp -r data node102:/home/kingbase/cluster/R6HA/kha/kingbase

    4、集群所有备库创建standby.signal文件

    1. [kingbase@node102 kingbase]$ cd data
    2. [kingbase@node102 data]$ touch standby.signal

    5、创建kingbase.auto.conf、kingbase.conf、es_rep.conf文件:(所有节点)

    1. [kingbase@node101 data]$ mv kingbase.auto.conf kingbase.auto.conf.bk
    2. # 将原集群的文件复制
    3. [kingbase@node101 data]$ cp ../data.bk/kingbase.auto.conf ./
    4. [kingbase@node101 data]$ cat kingbase.auto.conf
    5. # Do not edit this file manually!
    6. # It will be overwritten by the ALTER SYSTEM command.
    7. enable_upper_colname = 'on'
    8. wal_retrieve_retry_interval = '5000'
    9. primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
    10. recovery_target_timeline = 'latest'
    11. primary_slot_name = 'repmgr_slot_1'
    12. synchronous_standby_names = '1 (node102)'
    13. [kingbase@node101 data]$ cp ../data.bk/kingbase.conf ./
    14. [kingbase@node101 data]$ cp ../data.bk/es_rep.conf ./

    三、创建和配置流复制Tips:
    1)注意保证所有节点的数据库服务启动正常。
    2)如果数据库服务启动失败,注意查看sys_log(尤其是备库)。
    3)如果流复制失败,可以根据备库sys_log获取到备库无法流复制的原因(如本案例,是因为复制槽问题引起)。

    1、sys_ctl启动主备库数据库服务

    1. [kingbase@node101 bin]$ ./sys_ctl start -D ../data
    2. waiting for server to start....2022-06-20 11:02:01.860 CST [30274] WARNING: enable_upper_colname can only be
    3. .......
    4. server started

    2、查看流复制状态

    1. test=# select * from sys_stat_replication;
    2. pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state |
    3. sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
    4. -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+-
    5. ---------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+------------
    6. (0 rows)
    7. # 如以上所示流复制创建失败,查看备库sys_log,因为复制槽原因,导致备库无法创建流复制。

    如下图所示:

    3、查看复制槽信息

    1. test=# select * from sys_replication_slots;
    2. slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confi
    3. rmed_flush_lsn
    4. --------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+------
    5. ---------------
    6. slot_node102 | | physical | | | f | f | | | | 0/370B4F68 |
    7. slot_node101 | | physical | | | f | f | | | | |
    8. (2 rows)

    4、重新创建复制槽

    1. test=# select sys_drop_replication_slot('slot_node101');
    2. sys_drop_replication_slot
    3. ---------------------------
    4. (1 row)
    5. test=# select sys_drop_replication_slot('slot_node102');
    6. sys_drop_replication_slot
    7. ---------------------------
    8. (1 row)
    9. test=# select * from sys_replication_slots;
    10. slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
    11. -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
    12. (0 rows)
    13. test=# select sys_create_physical_replication_slot('repmgr_slot_2');
    14. sys_create_physical_replication_slot
    15. --------------------------------------
    16. (repmgr_slot_2,)
    17. (1 row)
    18. test=# select sys_create_physical_replication_slot('repmgr_slot_1');
    19. sys_create_physical_replication_slot
    20. --------------------------------------
    21. (repmgr_slot_1,)
    22. (1 row)
    23. test=# select * from sys_replication_slots;
    24. slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
    25. ---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
    26. repmgr_slot_2 | | physical | | | f | t | 31655 | 915 | | 0/420001E0 |
    27. repmgr_slot_1 | | physical | | | f | f | | | | |
    28. (2 rows)
    29. test=# select * from sys_stat_replication;
    30. pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
    31. -------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+----
    32. 31655 | 10 | system | node102 | 192.168.1.102 | | 36924 | 2022-06-20 11:07:38.664780+08 | | streaming | 0/420001E0 | 0/420001E0 | 0/420001E0 | 0/420001E0 | | | | 1 | sync | 2022-06-20 11:08:05.571558+08
    33. (1 row)
    34. # 如上所示,重新创建正确的复制槽后,流复制恢复正常。

    四、配置repmgr集群管理(主库完成)

    1、创建esrep库和esrep用户:

    1. # 根据原集群用户的密码,创建esrep用户
    2. [kingbase@node102 data]$ cat ~/.encpwd
    3. *:*:*:system:MTIzNDU2NzhhYg==
    4. #*:*:*:system:MTIzNDU2Cg==
    5. *:*:*:esrep:S2luZ2Jhc2VoYTExMA==
    6. [kingbase@node102 data]$ echo 'S2luZ2Jhc2VoYTExMA=='|base64 -d
    7. Kingbaseha110
    8. test=# create database esrep;
    9. CREATE DATABASE
    10. test=# create user esrep with superuser password 'Kingbaseha110';
    11. CREATE ROLE
    12. # 根据原集群的密码,修改system用户密码
    13. [kingbase@node101 bin]$ echo 'MTIzNDU2NzhhYg=='|base64 -d
    14. 12345678ab[kingbase@node./ksql -U system test
    15. ksql (V8.0)
    16. Type "help" for help.
    17. test=# alter user system with password '12345678ab';
    18. ALTER ROLE

    2、创建repmgr extension

    1. # 注意在kingbase.conf中支持repmgr extension
    2. [kingbase@node101 bin]$ cat ../data/kingbase.conf |grep repmgr
    3. ........
    4. shared_preload_libraries = 'repmgr,liboracle_parser, synonym, plsql, force_view, kdb_flashback,plugin_debugger, plsql_plugin_debugger, plsql_plprofiler, ora_commands,kdb_ora_expr, sepapower, dblink, sys_kwr, sys_ksh, sys_spacequota, sys_stat_statements, backtrace, kdb_utils_function, auto_bmr, sys_squeeze'
    5. #主库创建repmgr extension
    6. test=# create extension repmgr;
    7. CREATE EXTENSION

    3、注册主备库到repmgr集群

    1)注册primary节点

    1. [kingbase@node101 bin]$ ./repmgr primary register --force
    2. INFO: connecting to primary database...
    3. NOTICE: attempting to install extension "repmgr"
    4. NOTICE: "repmgr" extension successfully installed
    5. NOTICE: primary node record (ID: 1) registered
    6. [kingbase@node101 bin]$ ./repmgr cluster show
    7. ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
    8. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
    9. 1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

    2)注册standby节点

    1. [kingbase@node102 bin]$ ./repmgr standby register --force
    2. INFO: connecting to local node "node102" (ID: 2)
    3. INFO: connecting to primary database
    4. WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
    5. INFO: standby registration complete
    6. NOTICE: standby node "node102" (ID: 2) successfully registered

    3)查看集群节点状态

    1. [kingbase@node102 bin]$ ./repmgr cluster show
    2. ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
    3. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
    4. 1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
    5. 2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

    五、验证数据

    1、重启集群

    1. [kingbase@node101 bin]$ ./sys_monitor.sh restart
    2. 2022-06-20 11:26:21 Ready to stop all DB ...
    3. ......
    4. 2022-06-20 11:27:03 repmgrd on "[192.168.1.102]" start success.
    5. ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
    6. ----+---------+---------+-----------+----------+---------+-------+---------+--------------------
    7. 1 | node101 | primary | * running | | running | 5641 | no | n/a
    8. 2 | node102 | standby | running | node101 | running | 22473 | no | 0 second(s) ago
    9. [2022-06-20 11:27:08] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log"
    10. [2022-06-20 11:27:13] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log"
    11. 2022-06-20 11:27:19 Done.

    2、查看repmgrd进程状态(所有节点)

    1. [kingbase@node101 bin]$ ps -ef |grep repmgr
    2. kingbase 5641 1 0 11:26 ? 00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgrd -d -v -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf
    3. kingbase 5879 1 0 11:27 ? 00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

    3、验证迁移后数据

    1. [kingbase@node101 bin]$ ./ksql -U system test
    2. ksql (V8.0)
    3. Type "help" for help.
    4. test=# \l
    5. List of databases
    6. Name | Owner | Encoding | Collate | Ctype | Access privileges
    7. -----------+--------+----------+----------+-------------+-------------------
    8. esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    9. prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    10. prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    11. security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    12. template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
    13. | | | | | system=CTc/system
    14. template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
    15. | | | | | system=CTc/system
    16. test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
    17. (7 rows)
    18. test=# \c prod1
    19. You are now connected to database "prod1" as user "system".
    20. prod1=# \d
    21. List of relations
    22. Schema | Name | Type | Owner
    23. --------+---------------------+-------+--------
    24. public | sys_stat_statements | view | system
    25. public | t1 | table | system
    26. (2 rows)
    27. prod1=# select count(*) from t1;
    28. count
    29. --------
    30. 100000
    31. (1 row)
    32. # 迁移后数据和迁移前数据一致,迁移成功。

    六、总结

    1. 1、从单实例环境迁移数据到集群,如果需要保证数据一致,必须要将单实例及集群停库,对于生产环境,要考虑停机窗口。
    2. 2、如果需要将原集群数据重新加载到新的集群,需要将原集群数据做逻辑备份,但是在导入时如果有重复数据需注意处理。(如将测试数据再导入到新的集群中,可能有许多数据会重复)。
    3. 3、申请停机窗口,要考虑原单实例数据量的大小、主机间的网络带宽、集群节点数、集群配置时间、集群启动故障的处理时间等。
  • 相关阅读:
    论文阅读 6 | Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels
    java培训技术数据绑定流程原理
    Docker学习之路
    目标检测论文解读复现之七:基于SE-YOLOv5s的绝缘子检测
    windows PC virtualBox 配置
    一、Flink 1.13 源码解析前导——Akka通信模型
    终于阿里P8把SpringBoot+SpringCloud+Docker+MQ整合在一起,“啃完”真的很nice!
    多线程案例(单例、阻塞队列、生消模型、定时器)
    【附源码】计算机毕业设计JAVA移动电商网站
    循环神经网络(RNN)简单介绍及实现(基于表面肌电信号)
  • 原文地址:https://blog.csdn.net/lyu1026/article/details/125457219