hive数据导出

给外部表添加分区信息(静态方式创建分区)：

ALTER TABLE extpart_flow ADD PARTITION (year=2017, month=10,day=24)
LOCATION 'hdfs:///extpartition/2017/10/24';
1
2

创建普通表
导入数据
使用场景：在不知道分区数量的情况下，使用动态分区！

create table if not exists pv01(
id int,
name string,
sex string
)
partitioned by(age int)
row format delimited fields terminated by'\t'
stored as textfile;
1
2
3
4
5
6
7
8

在动态插入数据之前，必须设置hive为"非严格"模式
打开动态分区功能

hive >set hive.exec.dynamic.partition=true;
1

设置为非严格模式

hive >set hive.exec.dynamic.partition.mode=nonstrict;
1

不是必须的，默认每个节点可以创建的分区数量为100

hive >set hive.exec.max.dynamic.partitions.pernode=100;
1

将用户表按年龄分区，存储到分区表

insert into table pv01 partition(age) select id,name,sex,age from pv;
1

查看分区

show partitions pv01;
1

桶也是一样

创建普通表
导入数据
创建分桶表
创建分桶表(对stu表的eNo字段分桶)
注意：分桶字段和分区字段区别

create table if not exists bk_stu(
eNo int,
name string,
sex string,
age int
)
clustered by(eNo) into 4 buckets
row format delimited fields terminated by'\t'
stored as textfile;
1
2
3
4
5
6
7
8
9

注意：
强制多个 reduce 进行输出：
插入数据前需设置，不设置将会只有一个文件：

set hive.enforce.bucketing = true
1

要向分桶表中填充数据，需要将 hive.enforce.bucketing 属性设置为 true。
这样，Hive 就知道用表定义中声明的数量来创建桶。

注意这个定义，结果解，结果集
从普通表将数据插入到分桶表(注意;插入数据到分桶表时，只能以结果集
的方式插入数据)。

insert into table bk_stu select * from stu;
1

(1)、随机查询并且返回一桶数据：

select * from bk_stu tablesample(bucket 3 out of 4);
1

(2)、随机查询并且返回两桶数据：

select * from bk_stu tablesample(bucket 1 out of 2); 
1

#数据块取样 (TABLESAMPLE (n PERCENT))抽取表大小的n%

select * from bk_stu tablesample(50 PERCENT);
select * from bk_stu tablesample(25 PERCENT);
1
2

#指定数据大小取样(TABLESAMPLE (nM)) M为MB单位

select * from bk_stu tablesample(1M);
1

#指定抽取条数(TABLESAMPLE (n ROWS))

select * from bk_stu tablesample(4ROWS);
1

数据导入

四种方式

1.从本地导入

load data local inpath '/..........'  into table user;
1

2.从hdfs导入

load data inpath '/........' into table user;
1

3.从别的表中查询数据并导入hive中

insert into table use select * from users;
1

4.在创建表的时候，通过从别的表中查询相关记录然后导入

create table user  as select * from users;//全量导入
create table user as select id from user where id<10;
1
2

数据导出

1.导出到本地

insert overwrite into local directory '/usr/'  select * from user;//结果集方式
1

如需指定分割符

insert overwrite into local directory '/usr/' row format delimitered fields terminated by '\t' selectl * from user;
1

2.导出到hdfs

insert overwrite into local directory '/source' select * from user;
1

3.导出到hive的另一个表中，
就是普通方式

insert into table  use select * from user;
1

如若表有基本类型和复合数据类型！

insert overwrite local directory'/home/hadoop/outdir' 
 row format delimited fields terminated by '\t'
 collection items terminated by','
 map keys terminated by':'
 select* from userses;
1
2
3
4
5

 insert overwrite directory'/hivedata/user.txt' into table user
 row format delimited fields terminated by '\t'
 collection items terminated by','
 map keys terminated by':'
 select* from userses;
1
2
3
4
5

 直接在Linux终端执行(把数据导出到本地系统)：
1

 hive -S -e 'select * from hive_db.userses'>>/home/hadoop/hivedata/logsdir/users.txt;
1

 //shell脚本封装执行
1

 #!/bin/bash
 HQL="insert overwrite local directory '/home/hadoop/hivedata/logsdir' select log from reglog;"
 hive -S -e "$HQL" 
1
2
3

相关阅读:
Linux Command —— cut / grep /sort /uniq /wc
PLC和工控机的网络特性
[源码解析] NVIDIA HugeCTR，GPU版本参数服务器--- (6) --- Distributed hash表
【计算机毕业设计】小型OA系统设计与实现Springboot
typescript86-react中类组件的类型
自定义控件——视图的构建过程——视图的构造方法
音视频的Buffer处理
USACO 2021 December Contest, Silver
Unreal 各类指针是怎么回事
Mock快速入门使用及组件构造首页

原文地址：https://blog.csdn.net/eyexin2018/article/details/126125341