在 Hive 中,表的每一个分区对应表下的相应目录,所有分区的数据都是存储在对应的目录中;
可以按照一定维度数据进行分区存放,查询时大幅增加查询效率,同时也可以查询跨多个分区的整数集;
hive> CREATE TABLE t_partition (year string, month int, num int)
partitioned by (dt string, country string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
hive> desc t_partition;
- OK
- year string
- month int
- num int
- dt string
- country string
-
- # Partition Information
- # col_name data_type comment
- dt string
- country string
- Time taken: 0.406 seconds, Fetched: 10 row(s)
此时在 hive 默认仓库下创建了空目录:
$ hdfs dfs -ls /user/hive/warehouse/
drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:19 /user/hive/warehouse/t_partition
$ cat t_partition1.txt
- 2019,02,11
- 2020,04,13
- 2020,09,12
$ cat t_partition2.txt
- 2018,03,15
- 2021,10,01
- 2021,08,21
hive> load data local inpath '/home/hadoop/test_data/t_partition1.txt' into table t_partition partition(dt='2021-09-18', country='US');
- Loading data to table default.t_partition partition (dt=2021-09-18, country=US)
- OK
- Time taken: 1.847 seconds
hive> load data local inpath '/home/hadoop/test_data/t_partition2.txt' into table t_partition partition(dt='2021-09-19',country='CN');
- Loading data to table default.t_partition partition (dt=2021-09-19, country=CN)
- OK
- Time taken: 0.516 seconds
hive> select * from t_partition;
- OK
- 2019 2 11 2021-09-18 US
- 2020 4 13 2021-09-18 US
- 2020 9 12 2021-09-18 US
- 2018 3 15 2021-09-19 CN
- 2021 10 1 2021-09-19 CN
- 2021 8 21 2021-09-19 CN
- Time taken: 0.228 seconds, Fetched: 6 row(s)
hive> show partitions t_partition;
- OK
- dt=2021-09-18/country=US
- dt=2021-09-19/country=CN
- Time taken: 0.272 seconds, Fetched: 2 row(s)
hive> select INPUT__FILE__NAME from t_partiton;
- OK
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
- hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
- Time taken: 0.274 seconds, Fetched: 6 row(s)
$ hdfs dfs -ls -R /user/hive/warehouse/t_partition/
- drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18
- drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US
- -rw-r--r-- 3 hadoop supergroup 33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
- drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19
- drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN
- -rw-r--r-- 3 hadoop supergroup 33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
$ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
- 2022-09-19 16:10:15,043 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
- 2019,02,11
- 2020,04,13
- 2020,09,12
$ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
- 2022-09-19 16:12:29,363 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
- 2018,03,15
- 2021,10,01
- 2021,08,21