目录
数据采集时如果使用datax的话,必须先手工建好表之后才能进行数据采集;使用sqoop的话虽然可以默认建表,但是每次还要手工配置命令。表数量不多的话还好,如果多库多表需要批量采集的话工作量会很大,因此需要一个批量生成建表语句的功能来节省人力。
先确定好异构数据源的数据类型转换关系,可以定义好一张维表。
CREATE TABLE dim_data_type_convert
(
source string comment '源库',
source_data_type string comment '源库数据类型',
target string comment '目标库',
target_data_type string comment '目标库数据类型',
update_time string comment '更新时间'
)
COMMENT='数据类型转换维表';
数据示例如下:
| 源库 | 源库数据类型 | 目标库 | 目标库数据类型 | 更新时间 |
|---|---|---|---|---|
| mysql | bigint | hive | bigint | 20220817 |
| mysql | int | hive | bigint | |
| mysql | tinyint | hive | bigint | |
| mysql | char | hive | string | |
| mysql | varchar | hive | string | |
| mysql | datetime | hive | datetime | |
| mysql | decimal | hive | double | |
| mysql | double | hive | double | |
| mysql | float | hive | double | |
| mysql | json | hive | string | |
| mysql | mediumtext | hive | string | |
| mysql | text | hive | string | |
| mysql | time | hive | string | |
| mysql | timestamp | hive | timestamp | |
| mysql | varbinary | hive | binary | |
| mysql | binary | hive | binary |
- SELECT
- a.TABLE_NAME ,
- b.TABLE_COMMENT ,
- concat('CREATE TABLE IF NOT EXISTS ',a.TABLE_NAME ,' (',group_concat(concat(a.COLUMN_NAME,' ',
- c.target_data_type," COMMENT '",COLUMN_COMMENT,"'") order by a.TABLE_NAME,a.ORDINAL_POSITION) ,
- ") COMMENT '",b.TABLE_COMMENT ,"' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS orc;") AS DDL
- FROM
- (
- SELECT
- TABLE_SCHEMA,
- TABLE_NAME,
- COLUMN_NAME,
- ORDINAL_POSITION,
- DATA_TYPE,
- COLUMN_COMMENT
- FROM information_schema.COLUMNS
- WHERE TABLE_SCHEMA='你的库名'
- ) a
- LEFT JOIN information_schema.TABLES b
- ON a.TABLE_NAME=b.TABLE_NAME
- AND a.TABLE_SCHEMA=b.TABLE_SCHEMA
- --源库为mysql,目标库为hive
- LEFT JOIN
- (
- select
- *
- from dim_data_type_convert
- where source='mysql' and target='hive'
- ) c
- ON a.DATA_TYPE=c.source_data_type
- where b.TABLE_TYPE='BASE TABLE'
- GROUP BY
- a.TABLE_NAME,
- b.TABLE_COMMENT
- ;
-
生成示例:
| TABLE_NAME | TABLE_COMMENT | DDL |
|---|---|---|
| TABLE_NAME | TABLE_COMMENT | CREATE TABLE IF NOT EXISTS TABLE_NAME (COLUMN_NAME target_data_type COMMENT “COLUMN_COMMENT”) COMMENT "TABLE_COMMENT " ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS orc; |