• 值得反复研读的表连接之HASH JOIN方式


    两表关联返回少量数据应该走嵌套循环,两表关联返回数据量大的应该走HASH JOIN连接。

    HASH JOIN :两表等值关联,将数据量小的表作为驱动表,将数据量大的表作为被驱动表,对驱动表的连接列进行HASH 生成HASH TABLE ,并且也对被驱动表的连接列进行HASH生成HASH TABLE。能够匹配的数据关联上。HASH JOIN 只支持等值连接

    1. scott@orclpdb1:orclcdb>
    2. select /*+gather_plan_statistics use_hash(e,d)*/
    3. 2 e.ename, e.job, d.dname
    4. from emp e, dept d
    5. 4 where e.deptno = d.deptno;
    6. ENAME JOB DNAME
    7. ---------- --------- --------------
    8. SMITH CLERK RESEARCH
    9. ALLEN SALESMAN SALES
    10. WARD SALESMAN SALES
    11. JONES MANAGER RESEARCH
    12. MARTIN SALESMAN SALES
    13. BLAKE MANAGER SALES
    14. CLARK MANAGER ACCOUNTING
    15. SCOTT ANALYST RESEARCH
    16. KING PRESIDENT ACCOUNTING
    17. TURNER SALESMAN SALES
    18. ADAMS CLERK RESEARCH
    19. JAMES CLERK SALES
    20. FORD ANALYST RESEARCH
    21. MILLER CLERK ACCOUNTING
    22. 14 rows selected.
    23. scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    24. PLAN_TABLE_OUTPUT
    25. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    26. SQL_ID d3uaffp94gd3x, child number 0
    27. -------------------------------------
    28. select /*+gather_plan_statistics use_hash(e,d)*/ e.ename, e.job,
    29. d.dname from emp e, dept d where e.deptno = d.deptno
    30. Plan hash value: 615168685
    31. -------------------------------------------------------------------------------------------------------------------------
    32. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
    33. -------------------------------------------------------------------------------------------------------------------------
    34. | 0 | SELECT STATEMENT | | 1 | | 14 |00:00:00.01 | 54 | 5 | | | |
    35. |* 1 | HASH JOIN | | 1 | 14 | 14 |00:00:00.01 | 54 | 5 | 1695K| 1695K| 1079K (0)|
    36. | 2 | TABLE ACCESS FULL| DEPT | 1 | 4 | 4 |00:00:00.01 | 6 | 5 | | | |
    37. | 3 | TABLE ACCESS FULL| EMP | 1 | 14 | 14 |00:00:00.01 | 7 | 0 | | | |
    38. -------------------------------------------------------------------------------------------------------------------------
    39. Predicate Information (identified by operation id):
    40. ---------------------------------------------------
    41. 1 - access("E"."DEPTNO"="D"."DEPTNO")
    42. 21 rows selected.
    43. scott@orclpdb1:orclcdb>

    执行计划中离HASH JOIN 最近的表为驱动表,则DEPT为驱动表,EMP为被驱动表。驱动表 扫描一次,被驱动表也扫描一次。

    Note: NL中被驱动表需要扫描多次, HASH JOIN 中被驱动表只需扫描一次。

     Used-Mem表示HASH JOIN 消耗的PGA , 当驱动表太大、PGA不能完全容纳驱动表时,驱动表就会溢出到临时表空间,进而产生磁盘HASH JOIN。此刻HASH JOIN 会产生严重的性能问题。而NL无须消耗PGA.

    NL 会进行值传递。(NL循环一次,驱动表连接列传值给被驱动表连接列)

    HASH JOIN 没有传值的过程。(HASH JOIN 的驱动表与被驱动表的连接列都无须创建索引)

    OLTP 环境(一般高并发小事务居多):SQL执行计划多以NL为主。SGA 设置较大,PGA设置较小。(NL不消耗PGA)

    OLAP环境 (多数SQL都是大规模ETL),SQL返回结果集很多,SQL执行计划通常以HASH连接为主。往往会消耗大量的PGA,故PGA设置较大。

    当两表使用外连接进行关联,如果执行计划走的是HASH JOIN ,想要更改驱动表,我们需要使用swap_join_inputs, 而不是leading.

    1. scott@orclpdb1:orclcdb>
    2. explain plan for select /*+use_hash(d,e) leading(e)*/
    3. 2 *
    4. from dept d
    5. 4 left join emp e
    6. 5 on d.deptno = e.deptno;
    7. Explained.
    8. scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display);
    9. PLAN_TABLE_OUTPUT
    10. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    11. Plan hash value: 3713469723
    12. ---------------------------------------------------------------------------
    13. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    14. ---------------------------------------------------------------------------
    15. | 0 | SELECT STATEMENT | | 15 | 870 | 6 (0)| 00:00:01 |
    16. |* 1 | HASH JOIN OUTER | | 15 | 870 | 6 (0)| 00:00:01 |
    17. | 2 | TABLE ACCESS FULL| DEPT | 4 | 80 | 3 (0)| 00:00:01 |
    18. | 3 | TABLE ACCESS FULL| EMP | 14 | 532 | 3 (0)| 00:00:01 |
    19. ---------------------------------------------------------------------------
    20. Predicate Information (identified by operation id):
    21. ---------------------------------------------------
    22. 1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
    23. Hint Report (identified by operation id / Query Block Name / Object Alias):
    24. Total hints for statement: 2 (U - Unused (2))
    25. ---------------------------------------------------------------------------
    26. 1 - SEL$2BFA4EE4
    27. U - leading(e)
    28. 2 - SEL$2BFA4EE4 / D@SEL$1
    29. U - use_hash(d,e)
    30. 25 rows selected.
    31. scott@orclpdb1:orclcdb>

    从上执行计划中所得,两表关联走的是HASH JOIN , 表明use_hash(d,e)生效,而驱动表为DEPT故设置leading(e) 未生效。因此需要使用swap_join_inputs来更改外连接中HASH JOIN的驱动表

    1. scott@orclpdb1:orclcdb> explain plan for select /*+use_hash(d,e) swap_join_inputs(e)*/
    2. 2 *
    3. from dept d
    4. 4 left join emp e
    5. 5 on d.deptno = e.deptno;
    6. Explained.
    7. scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display);
    8. PLAN_TABLE_OUTPUT
    9. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    10. Plan hash value: 3590956717
    11. ------------------------------------------------------------------------------
    12. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    13. ------------------------------------------------------------------------------
    14. | 0 | SELECT STATEMENT | | 15 | 870 | 6 (0)| 00:00:01 |
    15. |* 1 | HASH JOIN RIGHT OUTER| | 15 | 870 | 6 (0)| 00:00:01 |
    16. | 2 | TABLE ACCESS FULL | EMP | 14 | 532 | 3 (0)| 00:00:01 |
    17. | 3 | TABLE ACCESS FULL | DEPT | 4 | 80 | 3 (0)| 00:00:01 |
    18. ------------------------------------------------------------------------------
    19. Predicate Information (identified by operation id):
    20. ---------------------------------------------------
    21. 1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
    22. Hint Report (identified by operation id / Query Block Name / Object Alias):
    23. Total hints for statement: 1 (U - Unused (1))
    24. ---------------------------------------------------------------------------
    25. 3 - SEL$2BFA4EE4 / D@SEL$1
    26. U - use_hash(d,e)
    27. 22 rows selected.
    28. scott@orclpdb1:orclcdb>

     swap_join_inputs(e) 更改了外连接中的HASH JOIN 的驱动表

    Q: 如何优化HASH JOIN ?

    • 避免写SELECT *  FROM 语句(由于HASH JOIN 将驱动表的select 检索列 和 join连接列 放到PGA中,尽可能的减少对PGA的占用,避免驱动表被溢出到临时表空间,从而提升查询性能。)
    • 对表进行拆分。
  • 相关阅读:
    Apollo 应用与源码分析:CyberRT-时间相关API
    【华为OD机试真题 JS】机器人走迷宫
    采用PHP开发的一套(项目源码)医疗安全(不良)事件报告系统源码:统计分析,持续整改,完成闭环管理
    高德地图与CAD图叠加显示方法汇总及优缺点分析
    【C++基础】13. 结构体
    大模型系统和应用——高效训练&模型压缩
    etcd实现大规模服务治理应用实战
    BSP板机支持包、linux启动分析、ARM裸机编程
    【笔记-OrCAD】WARNING(ORCAP-36038)解决办法
    python一点通:数据处理顶流Pandas 2.0有什么新功能?
  • 原文地址:https://blog.csdn.net/u011868279/article/details/127445786