Hive之数据迁移方案(实测)

文章目录

Hive的迁移涉及两个技术点：

Hive的迁移涉及两个技术点：

1.仅迁移元数据

参考：网易元数据管理 - hive 元数据迁移与合并

2.元数据及Hive数据全量迁移

主要流程
1.将旧集群的hive数据导出至其hdfs中
2.将旧集群hdfs中的导出数据下载到本地中
3.将本地的导出数据上传至新的集群hdfs中
4.将新集群hdfs中的数据导入至新集群中的hive中

2.1 全表迁移

2.1.1 旧集群

设置hive默认数据库

vim ~/.hiverc
use export_db;
1
2

hdfs dfs -mkdir -p /tmp/export_db_export
1

生成、执行导出脚本

hive -e "show tables;" | awk  '{printf "export table %s to |/tmp/export_db_export/%s|;\n",$1,$1}' | sed "s/|/'/g" | grep -v tab_name > ~/export.hql

hive -f ~/export.hql
1
2
3

发送数据

sudo scp -r export_db_export/ hr@192.168.1.xx:/opt/lzx
1

2.1.2 新集群

上传数据到hdfs

hdfs dfs -put ~/export_db /tmp/export_db_export
1

生成、执行导入脚本

cp ~/export.sql ~/import.sql
sed -i 's/export /import /g' ~/import.sql
sed -i 's/ to / from /g' ~/import.sql

hive -f ~/import.sql
1
2
3
4
5

2.2 仅部分分区迁移（主要步骤）

2.1.1 旧集群

生成、执行导出脚本

vim export.hql

export table hr_task_scan_official_3hh partition(ds='20200409')  to '/tmp/export_db_export/20200409';
export table hr_task_scan_official_3hh partition(ds='20200410')  to '/tmp/export_db_export/20200410';
export table hr_task_scan_official_3hh partition(ds='20200411')  to '/tmp/export_db_export/20200411';
export table hr_task_scan_official_3hh partition(ds='20200412')  to '/tmp/export_db_export/20200412';
export table hr_task_scan_official_3hh partition(ds='20200413')  to '/tmp/export_db_export/20200413';
export table hr_task_scan_official_3hh partition(ds='20200414')  to '/tmp/export_db_export/20200414';

hive -e ~/export.hql
1
2
3
4
5
6
7
8
9
10

2.1.2 新集群

无需建表

生成、执行导入脚本

vim import.sql

import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200409';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200410';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200411';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200412';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200413';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200414';

hive -f ~/import.sql
1
2
3
4
5
6
7
8
9
10

2.3 beeline连接hive并进行数据迁移

beeline 生成导出脚本

beeline -u jdbc:hive2://cdh01:10000 -e "use export_db;show tables;"| awk '{printf "export table %s to |/tmp/export_db_export/%s|;\n",$2,$2}' | sed "s/|/'/g"|sed '1,3d'|sed '$d' > ~/export.hql
1

执行脚本

sed -i '1i use export_db;' ~/export.hql
beeline -u jdbc:hive2://cdh01:10000 -n hdfs -f ~/export.hql
1
2

发数据到新集群hdfs

# 新的集群hdfs目录需要提前创建
hadoop distcp hdfs://cdh01:8020/tmp/export_db_export/ hdfs://cdh02:8020/tmp/export_db_export
1
2

生成导入脚本

cp ~/export.hql ~/import.hql
sed -i 's/export /import /g' ~/import.hql
sed -i 's/ to / from /g' ~/import.hql
sed -i '1d' ~/import.hql
sed -i '1i use import_db;' ~/import.hql
1
2
3
4
5

导入

create database import_db;
beeline -u jdbc:hive2://cdh02:10000 -n hdfs -f ~/import.hql
1
2

相关阅读:
【pandas】数据清洗的几种方法
MFC知识点
[2022 强网杯] devnull 复现
JavaWeb开发之——数据库相关概念(02)
是否应该升级到ChatGPT 4.0？深度对比ChatGPT 3.5与4.0的差异
机器学习基础之《回归与聚类算法（2）—欠拟合与过拟合》
openGauss学习笔记-99 openGauss 数据库管理-管理数据库安全-客户端接入认证之配置文件参考
Linux部署Jmeter
七夕了,男朋友说他想学学算法~
System.IO.FileSystemWatcher的坑

原文地址：https://blog.csdn.net/Lzx116/article/details/126539520