• hive 谓词下推实例分析(on与where的区别)


    测试数据

    • t1 表
    select * from t1;
    +--------+----------+---------+--------+
    | t1.id  | t1.name  | t1.age  | t1.dt  |
    +--------+----------+---------+--------+
    | 1      | aa       | 12      | 01     |
    | 1      | aa       | 12      | 02     |
    | 2      | aa       | 14      | 01     |
    | 2      | bb       | 14      | 02     |
    | 3      | cc       | 16      | 02     |
    | NULL   | aa       | 12      | 01     |
    +--------+----------+---------+--------+
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • t2 表
    select * from t2;
    +--------+----------+---------+--------+
    | t2.id  | t2.name  | t2.age  | t2.dt  |
    +--------+----------+---------+--------+
    | 1      | 1        | aa      | 12     |
    | 2      | 1        | aa      | 12     |
    | 1      | NULL     | aa      | 12     |
    | 1      | 2        | aa      | 14     |
    | 2      | 2        | bb      | 14     |
    | 2      | 3        | cc      | 16     |
    +--------+----------+---------+--------+
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    关联查询

    • t1 left join t2
    select * from t1 a left join t2 b on a.id=b.id;
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
    | NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
    | 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
    | 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | 2       | aa     | 14    |
    | 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
    | 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
    | 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
    | 2     | bb      | 14     | 02    | 2     | 3       | cc     | 16    |
    | 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
    | 2     | bb      | 14     | 02    | 2     | 2       | bb     | 14    |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • t1 left join t2 on a.id=b.id where a.dt =‘01’,保留表谓词下推,map端提前过滤
    select * from t1 a left join t2 b on a.id=b.id where a.dt ='01';
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
    | 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
    | 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
    | 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
    | 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • t1 left join t2 on a.id=b.id and a.dt =‘01’,保留表非谓词下推,reduce端过滤
    select * from t1 a left join t2 b on a.id=b.id and a.dt ='01';
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | 1     | aa      | 12     | 02    | NULL  | NULL    | NULL   | NULL  |
    | 2     | bb      | 14     | 02    | NULL  | NULL    | NULL   | NULL  |
    | 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
    | NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
    | 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
    | 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
    | 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
    | 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
    | 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    说明:保留表应用谓词下推,提前过滤,会把不符合条件的数据提前过滤掉;保留表不应用谓词下推,不提前过滤,只能在join发生时,不符合条件的数据不参与关联计算;

    • t1 left join t2 on a.id=b.id and b.dt =‘12’,空表谓词下推,map端过滤
    select * from t1 a left join t2 b on a.id=b.id and b.dt ='12';
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
    | 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
    | 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
    | NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
    | 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • t1 left join t2 on a.id=b.id where b.dt =‘12’,空表非谓词下推,reduce端过滤
    select * from t1 a left join t2 b on a.id=b.id where b.dt ='12';
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    | 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
    | 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
    | 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
    +-------+---------+--------+-------+-------+---------+--------+-------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    说明:空表应用谓词下推,提前过滤,会把不符合条件的数据提前过滤掉;空表不应用谓词下推,不提前过滤,只能在join完成时,过滤不符合条件的数据;

    总结 : 是否应用谓词下推,最后产生的结果往往不同,这里需要特别注意。

    参考:
    一文弄懂Hive中谓词下推(on与where的区别)

  • 相关阅读:
    安装 Gin 框架
    IDEA常用快捷键
    数字孪生技术:智慧运维的未来之路
    hadoop大数据原理与应用------初识Hadoop数据集
    企业公众号怎么做内容?这四个阶段要做好
    MQ进阶面试题
    Java类和对象(1)
    【小想法】第1期:模型工程化,向量相似度,早停机制,BERT微调小trick
    微信小程序通过npm引入tdesign包进行构建的时候报错
    根据机械臂视频模拟出运动路线
  • 原文地址:https://blog.csdn.net/weixin_38251332/article/details/132720390