• Hive 分区表


    Partition 分区表

    Hive 中,表的每一个分区对应表下的相应目录,所有分区的数据都是存储在对应的目录中;

    可以按照一定维度数据进行分区存放,查询时大幅增加查询效率,同时也可以查询跨多个分区的整数集;

    新建表

    hive> CREATE TABLE t_partition (year string, month int, num int)

            partitioned by (dt string, country string)

            ROW FORMAT DELIMITED

            FIELDS TERMINATED BY ',';

    查看表结构

    hive> desc t_partition;

    1. OK
    2. year string
    3. month int
    4. num int
    5. dt string
    6. country string
    7. # Partition Information
    8. # col_name data_type comment
    9. dt string
    10. country string
    11. Time taken: 0.406 seconds, Fetched: 10 row(s)

    此时在 hive 默认仓库下创建了空目录:

    $ hdfs dfs -ls /user/hive/warehouse/

    drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:19 /user/hive/warehouse/t_partition   

    待导入数据文件

    $ cat t_partition1.txt

    1. 2019,02,11
    2. 2020,04,13
    3. 2020,09,12

    $ cat t_partition2.txt

    1. 2018,03,15
    2. 2021,10,01
    3. 2021,08,21

    导入数据

    hive> load data local inpath '/home/hadoop/test_data/t_partition1.txt' into table t_partition partition(dt='2021-09-18', country='US');

    1. Loading data to table default.t_partition partition (dt=2021-09-18, country=US)
    2. OK
    3. Time taken: 1.847 seconds

    hive> load data local inpath '/home/hadoop/test_data/t_partition2.txt' into table t_partition partition(dt='2021-09-19',country='CN');

    1. Loading data to table default.t_partition partition (dt=2021-09-19, country=CN)
    2. OK
    3. Time taken: 0.516 seconds

    查看数据

    hive> select * from t_partition;

    1. OK
    2. 2019 2 11 2021-09-18 US
    3. 2020 4 13 2021-09-18 US
    4. 2020 9 12 2021-09-18 US
    5. 2018 3 15 2021-09-19 CN
    6. 2021 10 1 2021-09-19 CN
    7. 2021 8 21 2021-09-19 CN
    8. Time taken: 0.228 seconds, Fetched: 6 row(s)

    查看表分区

    hive> show partitions t_partition;

    1. OK
    2. dt=2021-09-18/country=US
    3. dt=2021-09-19/country=CN
    4. Time taken: 0.272 seconds, Fetched: 2 row(s)

    查看表中数据的文件路径

    hive> select INPUT__FILE__NAME from t_partiton;

    1. OK
    2. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
    3. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
    4. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
    5. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
    6. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
    7. hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt
    8. Time taken: 0.274 seconds, Fetched: 6 row(s)

    查看 hdfs 中存放的文件结构

    $ hdfs dfs -ls -R /user/hive/warehouse/t_partition/

    1. drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18
    2. drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US
    3. -rw-r--r-- 3 hadoop supergroup 33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt
    4. drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19
    5. drwxr-xr-x - hadoop supergroup 0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN
    6. -rw-r--r-- 3 hadoop supergroup 33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt

    文件内容

    $ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt

    1. 2022-09-19 16:10:15,043 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2. 2019,02,11
    3. 2020,04,13
    4. 2020,09,12

    $ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt

    1. 2022-09-19 16:12:29,363 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2. 2018,03,15
    3. 2021,10,01
    4. 2021,08,21

  • 相关阅读:
    el-tree实现菜单功能
    品牌投资与形象全面升级 | 快来认识全新的 Go 旅城通票
    Spring Rce 漏洞分析CVE-2022-22965
    【torch】如何把给定mask按比例选取再次划分mask?
    cubeIDE开发,在LCD显示摄像头抓取的图片数据
    支持向量机SVM--线性
    poetry(python依赖管理和打包工具)快速入门 && 项目实例
    如何使用vuex
    npm 镜像源切换与设置
    C++特殊定制:揭秘cpo与tag_invoke!
  • 原文地址:https://blog.csdn.net/zhy0414/article/details/126949188